Gaia/System/Keyboard/IME/Latin/Prediction & Auto Correction: Difference between revisions

From MozillaWiki
< Gaia‎ | System‎ | Keyboard
Jump to navigation Jump to search
 
(127 intermediate revisions by the same user not shown)
Line 1: Line 1:
(This page is still being populated by John Lu -- jlu@mozilla.com)
The Latin IME of Gaia's built-in Keyboard App contains a prediction/auto-correction feature.
As its name implies, given a user-input word, it guesses what the user means to type (the
"auto-correction" sense), or what the user is planning to type (the "prediction" sense).
The feature relies on a dictionary, i.e. a list of words, for a language, and a prediction
engine that employs heuristics-based algorithms to match the user input against the dictionary
words.
 
A list of keyboard languages currently supported by Latin IME may be found at [[L10n:B2G/Fonts]].
 
This article contains introductory materials to the data structures and algorithms upon which
the prediction engine is built, and offers illustrative examples where applicable.


== Data Structures and Algorithms ==
== Data Structures and Algorithms ==
Line 106: Line 116:
The goal for the prediction/matching process is to return a number of '''suggestion'''s which
The goal for the prediction/matching process is to return a number of '''suggestion'''s which
are possible correct spellings of the user input. We generate such suggestions by growing
are possible correct spellings of the user input. We generate such suggestions by growing
'''candidate'''s step-by-step. A candidate is a prefix of a dictionary word, and may also be
'''candidate'''s iteratively.
regarded as what you get by traversing from TST root node down a specific levels of center nodes,
 
and optionally moving horizontally with next pointers within a level.
A candidate is a collective information of:
In that TST tree above, "zili", "zala", and "zse" are candidates.
# A prefix of a dictionary word.
# (Deriving from last point) a specific TST node as we get by traversing from TST root node down a specific levels of center nodes, and optionally moving horizontally with next pointers within a level.
# A suffix of the user input.
# And other auxiliary information, such as the weight of this candidate, detailed in sub-sections below.


At each step, we generate new candidates:
At each iteration, we generate new candidates:
# The current candidate is represented by a TST node (again, by traversing from TST root node down)
# Let's colloquially call the current candidate's prefix as the "current prefix".
# There are a few dictionary word prefixes, represented by TST nodes that may be reachable from the current candidate TST node.
# There are a few dictionary word prefixes, represented by TST nodes that may be reachable from the current candidate's TST node.
#* Let's colloquially call them "new prefixes".
#* Let's also colloquially call them "potential new prefixes".
#* Keep in mind that "reachable" includes both the center and the next pointers (details in later sections).
#* Keep in mind that "reachable" includes both the center and the next pointers (details in later sections).
# Comparing the "new prefix" with the user input up to a specific position N, we may find a way to mutate the user input, to match the "new prefix".
#* These potential new prefixes have the potential to become new individual candidates.
# The resulting candidate is usually one char longer than the previous candidate (thus in the next step, our prefix strings to compare are N+1 chars long).
# Essentially, the potential new prefix differs from the current prefix by only one character. It's either a new char appending to the current prefix, or a changed char of the last char of the current prefix.
# The potential new prefix can only become a new candidate if such difference is allowed by '''GrowOperation'''s, a set of predefined rules for the matching process.
#* The '''GrowOperation'''s determines such eligibility by considering the nature of the new/changed character, the suffix of the user input, and the auxiliary information.
#* Usually, only the first char of the user input suffix is considered, since there is only one char difference between the current prefix and the potential new prefix.
#* And only those potential new fixes that fit the GrowOperation will become new candidates to be further grown in the next iteration.


In the TST example above, if the current candidate is "zar", then the "new prefixes" may be "zart", and "zal".
In the TST example above, if the current candidate is "zar", then the "potential new prefixes" may be "zart", and "zal".


When we finally come close to the user input, the final candidates are sent back to Latin IME as suggestions.
Usually, we go down center nodes. So, the resulting candidate's prefix is usually
one char longer than the previous candidate's prefix, and the resulting candidate's
user input suffix is usually one char shorter than the previous candidate's suffix.
When we finally run out of the user input suffix, the final candidates are sent back to Latin
IME as suggestions.


During this matching/mutation process, we associate a weight with each candidate we're
During this matching process, we associate a weight with each candidate we're
considering. A weight of a candidate is a collective product of that candidate's
considering. A weight of a candidate is a collective product of that candidate's
frequency value (as stored in its TST node), and how many and what kinds of mutations
frequency value (as stored in its TST node), and how many and what kinds of GrowOperations
we have to do to make the (prefix of the) input string equal the "new prefix".
we have had done during the growth of the candidate to make it look like the user input.
We'll introduce the weighting
We'll introduce the weighting detail later in this section. To avoid exponentially growing our search space,
detail later in this section. To avoid exponentially growing our search space,
candidates and suggestions are stored in a bounded priority queue, their weight being
candidates and suggestions are stored in a bounded priority queue, their weight being
the priority. When we go from one step to another step, we always retrieve the
the priority. When we go from one iteration to another iteration, we always retrieve the
highest-priority candidates from the queue and grow candidates from the queue. Low-priority
highest-priority candidates from the queue and grow candidates from the queue. Low-priority
candidates are dropped when the queue is full.
candidates are dropped when the queue is full.


Generally speaking, we have a few kinds of '''mutation'''s:
Generally speaking, we have a few kinds of heuristic GrowOperations:
* Insertion: where we insert a character that's present in "new prefix" at position N but not in the user input at position N.
{| class="wikitable"
* Deletion: where we delete a character that's not present in "new prefix" at position N but present in the user input at position N.
|-
* Transposition: where the character in "new prefix" at position N is actually the character in the user input at position N+1.
! GrowOp !! Description !! Example
** Note: we are not checking if the character in "new prefix" at position N+1 is the character in the user input at position N. We merely swap the user input chars at positions N and N+1. In the next step, if the user input char at position N+1 (which was the char at position N in this step) can't be matched without expensive mutations, the grown candidate will be likely discarded according to weight/correction limit.
|-
* Substitution: where we just brute-force change the character in the user input at position N to be the character of the "new prefix" at position N.
| Insertion
||
The last char of the potential new prefix is not the first char of the user input suffix. In the next iteration, the user input suffix remains the same.
|style=white-space:nowrap|
* Dict word: "zarca"
* User input: "zaca"
* Old prefix: "za"
* User input suffix: "ca"
* New prefix: "zar"
* Next-iteration user input suffix: "ca"
|-
| Deletion
||
The last char of the potential new prefix is the second char of the user input suffix; we 'delete' the first char of the user input suffix by dropping the first & second chars in the user input suffix to form the next iteration's suffix.
 
Note that the candidate's prefix in the next iteration will contain the user input suffix second char in this iteration, so we drop two chars here.
|style=white-space:nowrap|
* Dict word: "zaar"
* User input "zacar"
* Old prefix: "za"
* User input suffix: "car"
* New prefix: "za"
* Next-iteration user input suffix: "r"
|-
| Transposition
||
Conceptually, this is "''the last char of the potential new prefix is the second char of the user input suffix, and the center node of the this TST node contains the first char of the user input suffix".
 
However, in reality, we are not checking the "and" part. We merely allow the potential new prefix to become the candidate's new prefix in the next iteration, and drop the first char of the user input suffix. In the next iteration, if the first char of the user input suffix (which is also the first char of that in this current iteration) can't be matched against the new prefix's center node character without expensive GrowOperations, the grown candidate will be likely discarded according to weight/correction limit as detailed below.
|style=white-space:nowrap|
* Dict word: "zaacr"
* User input: "zacar"
* Old prefix: "za"
* User input suffix: "car"
* New prefix: "zaa"
* Next-iteration user input suffix: "cr"
|-
|
Substitution
||
We just drop the user input suffix's first char.
|style=white-space:nowrap|
* Dict word: "zaxar"
* User input "zacar"
* Old prefix: "za"
* User input suffix: "car"
* New prefix: "zax"
* Next-iteration user input suffix: "ar"
|}
 
Additionally, there is indeed a "non-heuristic" GrowOperation:
{| class="wikitable"
|-
! GrowOp !! Description !! Example
|-
| Match
||
The last char of the potential new prefix is the first char of the user input suffix.
|style=white-space:nowrap|
* Dict word: zacry
* User input "zacar"
* Old prefix: "za"
* User input suffix: "car"
* New prefix: "zar"
* Next-iteration user input suffix: "ar"
|}


In addition to the weighting mechanism, the prediction engine also imposes a maximum number
In addition to the weighting mechanism, the prediction engine also imposes a maximum number
of '''correction'''s that can be made to a user input for matching a specific ''suggestion''.
of '''correction'''s that can be made for a user input to match a specific ''suggestion''.
Most of the mutations count as corrections, but some specific mutation operations (such
Most of the GrowOperations count as corrections, but some specific GrpwOperations (such
as substituting a character with its diacritic variant or with its capitalized variant)
as substituting a character with its diacritic variant or with its capitalized variant)
don't. Note we're talking about suggestions here -- a candidate growing into a new candidate
don't. Note we're talking about suggestions here -- a candidate growing into a new candidate
retains the information of the number of corrections we have had to make to the user input.
retains the information of the number of corrections we have had to make to match the user input.
During the process, when we need to make more corrections to the user input than allowed in
During the process, when we need to make more corrections than allowed for the user input
order to match a specific candidate, we drop that candidate from the queue right away.
to match a specific candidate, we drop that candidate from the queue right away.


Now back to the weight of the candidates. As said above, this weight is based on the word
Now back to the weight of the candidates. As said above, this weight is based on the word
frequency of the candidate's TST node, but it is reduced when we make mutations to the
frequency of the candidate's TST node, but it is reduced when we make GrowOperations to the
user's input, to reflect our decreased confidence in making such suggestion.
user's input, to reflect our decreased confidence in making such suggestion.
In the codes, we define a few weight multipliers when we make various kinds of mutations.
In the codes, we define a few weight multipliers when we make various kinds of GrowOperations.
Such multipliers changes a candidate's weight as it grows and as we go through the matching
Such multipliers changes a candidate's weight as it grows and as we go through the matching
process. Generally speaking, substituting variant forms and inserting apostrophes does not
process. Generally speaking, substituting variant forms and inserting apostrophes does not
reduce the weight by much at all. Substituting a nearby key reduces the weight moderately
reduce the weight by much at all. Substituting a nearby key reduces the weight moderately
(depending on how near the keys are on the screen). Insertions, deletions and transpositions
(depending on how near the keys are on the screen). Insertions, deletions and transpositions
reduce the weight significantly.
reduce the weight significantly. A character match, of course, does not reduce the weight at all.


The prediction engine holds <code>nearbyKeys</code> structure which contains
The prediction engine holds <code>nearbyKeys</code> structure which contains
spatial information of keys -- which keys are near each other, and how near they are.
spatial information of keys -- which keys are near each other, and how near they are.
Such information is provided by the Latin IME, subsequently provided by Keyboard App's
Such information is provided by the Latin IME, subsequently provided by Keyboard App's
layout rendering management logics. As said above, the spatial information is used to weight
layout rendering management logics. As said above, the spatial information is used to weigh
substitutions.
substitutions.


Line 171: Line 256:


==== Function Call Breakdown ====
==== Function Call Breakdown ====
(Not yet written)
We're not doing a line-by-line code walkthrough here. This is an illustration of: which function calls which, and which does what when called; but without a few technical subtleties; it is meant to paint (only) the overall picture of the functionality/responsibility of each function.
 
===== <code>predict()</code> and <code>getSuggestions()</code> =====
 
<code>predict()</code> is the called by Latin IME and is the commencement of the whole prediction process. Here, we're provided with the user input, the max number of suggestions, candidates, and corrections for each suggestion. The number of suggestions and candidates limit our search space, and it should be no surprise that the number of the candidates should be larger than that of the suggestions. The max number of corrections limits the difference between the user input and the suggestions, and we usually allow only 1 or 2 corrections.


==== Step-by-Step Example ====
This function synchronously set up some data structures before triggering <code>getSuggestions()</code> asynchronously. <code>getSuggestions()</code> may be regarded as "step 2" of <code>predict()</code> and is only executed once in the whole prediction process. Here, we check for early-return cases (where the user input contains chars that aren't even in the dictionary, or the user input is longer than the longest word in the dictionary). Then, we begin the growing process by adding the first candidate into the candidate queue and call <code>processCandidates()</code>.
(Not yet written)


==== Misc Bits and Technical Details ====
And that first candidate is: an empty prefix, and the whole user input string as the suffix; and that's at the root node of the TST. As explained below, we go through the list of next-pointed nodes in an iteration, so we'll get some potential new prefixes with some next nodes of the root node (recall that all these next nodes are at the same level as the root node) that fit some GrowOperations for the first char of the user input.
(Not yet written)
 
===== The Growing Process =====
As outlined in the previous sub-section, the growing process heuristically GrowOperates a candidate and evaluate such GrowOperations against the user input. It's called growing because we carry out this process character by character, from an empty string (= TST root node) to a complete dictionary word (= as we level-down to a terminal node).
 
The growing process consists of <code>addCandidate()</code>, <code>processCandidates()</code> and <code>process()</code> cycles. As we've just called <code>addCandidate()</code> at <code>getSuggestions()</code>, let's begin with <code>processCandidates()</code>.
 
<code>processCandidates()</code> iterates through candidates in the queue, and calls <code>process()</code> on each candidate. For each candidate, <code>process()</code> GrowOperates the prefix to match the user input suffix & generate new candidates according to various pieces of information, and calls <code>addCandidate()</code> to add the new candidates to the queue; or, it calls <code>addSuggestion()</code> if a candidate has grown into a suggestion. Not surprisingly, the new candidates will be processed in the next time <code>processCandidates()</code> retrieves them from the queue.


== Dictionary Blob Structure ==
Aside from the candidate information said in big picture sub-section, <code>process()</code> uses these auxiliary information pieces to GrowOperate on candidates:
The [https://github.com/mozilla-b2g/gaia/blob/master/apps/keyboard/js/imes/latin/dictionaries/xml2dict.py xml2dict.py] script
# The number of corrections we have made during the growth of the candidate, as we can't make more corrections than max allowed.
processes the word list, generates the TST along with supporting metadata and auxiliary data structures, and
# The TST node's word frequency and some other weight information (details below in <code>process()</code> and <code>addCandidate()</code>).
serializes it into a compact binary blob, which is written as the *.dict file, to be unserialized and used by
# The character of the TST node, of its next nodes, and of the center nodes of this TST node and its next nodes; this affects how we weigh GrowOperations.
the predictions engine.
# The difference of the length of the user input and the levels-down of the TST node (details in the list of growOperations below).


=== Header ===
The growing process is heuristic, and it halts when we don't have any more candidates to grow in <code>processCandidates()</code>, or when the highest-priority candidate has a weight too low.
The file begins with the 8 ASCII characters "FxOSDICT" and four more bytes.
Bytes 8, 9 and 10 are currently unused and byte 11 is a dictionary format version
number, currently 1.


Byte 12 file specifies the length of the longest word in the dictionary.
Again, when a candidate's growing process has just started, its prefix is an empty string, and the suffix is the user input. When it ends (and we find some suggestion), the prefix is the suggestion that the candidate has grown into, and the suffix is a possibly empty string.


=== Character Table ===
====== <code>processCandidates()</code> ======
We retrieve a certain number of candidates from the queue. As the queue is a priority queue, we always retrieve the highest-priority candidate on each queue <code>remove()</code> call.


After these first 13 bytes, the file contains a character table that lists the characters used
As said above, the heuristic process halts when we don't have any more candidate from the queue, or when the weight of the highest-priority candidate is too low. "Too low" means the candidate's weight is lower than the least-priority suggestion in the suggestion queue (i.e. <code>suggestions.threshold</code>), which means the candidate will have no chance to grow into a suggestion that can be inserted into the suggestion queue.
in the dictionary. This table begins with a two-byte big-endian integer that specifies the
number of entries in the table. Table entries follow. Each table entry is a big-endian
two-byte character code, followed by a big-endian 4-byte number that specifies the number of
times the character appears in the dictionary.


=== Serialized Nodes ===
The heuristic process halts by <code>processCandidates()</code> returning without calling <code>setTimeout(processCandidates)</code> to re-trigger itself.


After the character table, serialized nodes are stored, starting at byte [15 + num_char_table_entries*6],
Note that the idea of "one" <code>processCandidates()</code> call does not map to the idea of "one" growth iteration for candidates. The number of candidates to retrieve in each <code>processCandidates()</code> call is a fixed constant, only relevant to UX responsiveness (how frequently we want to relinquish our execution back to the JS engine's event queue).
'''depth-first'''. Each node has variable length, between 1 byte and 6 bytes long, encoded as follows.  


The first byte of each node is always an 8-bit bitfield: csnfffff
====== <code>addCandidate()</code> ======
A candidate along with its related information is added to the candidate queue here. The information, as <code>process()</code> relies on and as briefly listed above, formally includes (in function argument's order):
* <code>pointer</code>: The TST node the candidate's prefix is associated with.
* <code>output</code>: The (current) prefix, that is growing. This will loosely resemble, if not match, a prefix of the user input.
* <code>remaining</code>: The user input suffix, i.e. the remaining part of the input we haven't attempted to match.
* <code>multiplier</code>: The multiplicative product of the multipliers from different GrowOperations we've performed during the growth of the candidate.
* <code>frequency</code>: The candidate node's word frequency.
* <code>corrections</code>: The number of corrections we have made during the growth of the candidate.


The c bit specifies whether this node represents a character.
We may also apply a couple weight boost mechanisms here.
* If c is 1, then a character code follows this first byte.
* If c is 0, then this is a terminal node that marks the end of the word.
** If the n bit is 0, then this node is only 1-byte long (= this very first byte).
** If the n bit is 1, then the "next" pointer follows, as described below, and this node is 4-byte long.


The s bit specifies the size of the character associated with this node.
====== <code>process()</code> ======
* If s is 0, the character is one byte long.
<code>process()</code> is the core of the growing process. It performs various GrowOperations on one candidate, and calls <code>addCandidate()</code> with the results; in our words, we say the candidate has gone through one growth iteration. If we have reached the end of the user input, then the candidate is supplied to <code>addSuggestion</code> to add to the suggestion queue.
* If s is 1, the character is a big-endian two byte value.


The n bit specifies whether this node includes a next pointer.
In order to see what we may potentially GrowOperate the candidate into, <code>process()</code> goes through the nodes linked by the next pointers of the candidate's associated TST node (and the TST node itself), to form various "potential new prefixes". Let's see a quick example of a single iteration.
* If n is 1, the character code is followed by a big-endian 3-byte number.


The fffff bits represent a number between 0 and 31 and provide a weight for the node.
# Consider the TST in the beginning of the article. Suppose the user input is "zola".
This is usually based on the word frequency data from the dictionary, though this may
# Suppose at the previous iteration, we added a candidate of: prefix "z", user input suffix "ola", with the Match GrowOpeation. From the previous iteration to this iteration, we go down the TST level by one and we're now at the [i -> a -> s] level. The node is 'i'.
be tuned by adjusting frequency depending on word length, for example. At any node,
# At the current iteration, <code>output</code> is "z". <code>remaining</code> is "ola". We're at the "i" node. At this moment, we want to loop through all the next-nodes for 'i', i.e. 'i', 'a', 's', and see if the GrowOperations work out on the candidate.
these frequency bits represent the weight of the highest-frequency word that we can meet
# As it turns out, we'll only have:
when we traverse along this node's center-pointer descendant.
#* Three substitution GrowOperations, which results in three new candidates with prefix 'zi', 'za', 'zs'; and all have suffix 'la'. The TST nodes associated with the candidates are different subtrees from 'i', 'a', and 'z'.
#** If we're specifically talking about QWERTY keyboard, then, within these substitutions, "zi" will weigh more than other substitutions, as 'i' and 'o' are nearby.
#* Three insertion GrowOperations, which results in three new candidates with prefix 'zi', 'za', 'zs'; and all have suffix 'ola'. The TST nodes associated with the candidates are different subtrees from 'i', 'a', and 'z'.
# Finally, if we use default weighting and <code>maxCorrection</code> parameters (which is 1), for all the six new candidates, only the substitution Grow Operation for 'i' will actually fully grow into a suggestion. Indeed, our tree doesn't have other dictionary words that are less than two corrections from "zola".


If the c bit is set, the following one or two bytes (depending on the s bit) of the node
A list of GrowOperations will be introduced in a later section, including their weight multipliers, whether they count as corrections, and so on.
are the Unicode character code that is stored in the node.


If the n bit is set, the following 3 bytes are a big-endian 24-bit unsigned integer,
In each <code>process()</code> call, we may try more than one GrowOperations for a candidate (and call <code>addCandidate()</code> for each applicable GrowOperation result).
representing the pointer to the "next" character node. The pointer contains the byte offset
from the beginning of the serialized nodes, i.e. after the character table.


If the c bit was set, the node has a character, and this means that it also has a center child,
It's worth noting that not all GrowOperations are mutually exclusive, and for those who are, we always check if a specific GrowOperation is feasible first before checking another. This is codified in <code>process()</code>. Some GrowOperations are coded within a if-then-elseif-else chain, while being associative with other GrowOperations who themselves form a separate if-then-else chain. Note that the order of GrowOperations within one if-then-else chain is important: We always test for the higher multiplier one first. Still others put a <code>continue</code> statement that bypasses all remaining GrowOperations, for this next node of this candidate.
as defined by TST. We serialize the tree so that in the binary blob, the center child node
always follows the current node in the blob, so we don't need to store the pointer to the center child.


As it turns out, we never need left and right nodes during prediction, so they are not serialized
====== <code>addSuggestion()</code> ======
into the blob.
Similar to <code>addCandidate()</code>, <code>addSuggestion()</code> adds the grown suggestion to the suggestion queue. A few subtleties exist:
# If the user input has its first letter capitalized, we capitalize such the first letter in the suggestion too.
#* Note that the dictionary defines words of different capitalization as different words, possibly with different weights. However, there are some situations where we might want the first letter of a word to be capitalized, while contextually the word should be regarded as a lowercase word in the dictionary. A good example is the beginning of a sentence.
# We may get duplicated suggestions, grown from different candidates. We keep the higher-priority one, for sure.


==== TST Serialization Example ====
==== List of GrowOperations ====
This section details each GrowOperation, including whether it counts as a correction and the associated multiplier. You may find the GrowOp names in actual codes.


Let's serialize the TST in the "Data Structures and Algorithms" section.
Suppose the normalized frequency (from 0 to 31, as described above) is:
{| class="wikitable"
{| class="wikitable"
|-
|-
! Word !! Normalized Frequency
! GrowOp !! Description !! Correction? !! Multiplier !! Example
|-
| Match
||
The last char of the potential new prefix is the first char of the user input suffix.
||
No
||
1
|style=white-space:nowrap|
* Dict word: zacry
* User input "zacar"
* Old prefix: "za"
* User input suffix: "car"
* New prefix: "zar"
* Next-iteration user input suffix: "ar"
|-
| Insertion-Punctuation
||
The last char of the potential new prefix is a punctuation character (an apostrophe, for example), and is not the first char of the user input suffix. In the next iteration, the user input suffix remains the same.
||
No
||
<code>PUNCTUATION_INSERTION_MULTIPLIER</code><br />
= 0.95
|style=white-space:nowrap|
* Dict word: "za'ca"
* User input: "zaca"
* Old prefix: "za"
* User input suffix: "ca"
* New prefix: "za'"
* Next-iteration user input suffix: "ca"
|-
| Insertion-Any
||
(This is the "regular" Insertion GrowOp as explained in the [[#Big Picture & Terminology]] section.)
||
Yes
||
<code>INSERTION_MULTIPLIER</code><br />
= 0.3
|style=white-space:nowrap|
* Dict word: "zarca"
* User input: "zaca"
* Old prefix: "za"
* User input suffix: "ca"
* New prefix: "zar"
* Next-iteration user input suffix: "ca"
|-
|-
| zila || 31
| Insertion-Extend
||
The user input suffix is empty, but the TST node is not a terminal node. We effectively append the TST node's char to generate the new prefix.
||
No
||
<code>WORD_EXTENSION_MULTIPLIER</code><br />
= 0.4
|style=white-space:nowrap|
* Dict word: "zarcat"
* User input: "zarc"
* Old prefix: "zarc"
* User input suffix: ""
* New prefix: "zarca"
* Next-iteration user input suffix: ""
|-
|-
| zilian || 22
| Deletion
||
(This is the "regular" Deletion GrowOp as explained in the [[#Big Picture & Terminology]] section.)
||
Yes
||
<code>DELETION_MULTIPLIER</code><br />
= 0.1
|style=white-space:nowrap|
* Dict word: "zaar"
* User input "zacar"
* Old prefix: "za"
* User input suffix: "car"
* New prefix: "za"
* Next-iteration user input suffix: "r"
|-
|-
| ziliac || 7
| Deletion-AtEnd
||
This is mostly the same as the regular deletion except that it's specifically for when we arrive at the terminal node while still having one character at the user input suffix. The rationale is the same: think that the user input suffix's second char is "nothing", just as we have no center pointer for this node. The code logics for the regular Deletion GrowOp would fail, hence the special case.
||
Yes
||
<code>DELETION_MULTIPLIER</code><br />
= 0.1
|style=white-space:nowrap|
* Dict word: "zaar"
* User input "zaare"
* Old prefix: "zaar"
* User input suffix: "e"
* New prefix: "zaar"
* Next-iteration user input suffix: ""
|-
| Transposition
||
(Same as explained in the [[#Big Picture & Terminology]] section.)
||
Yes
||
<code>TRANSPOSITION_MULTIPLIER</code><br />
= 0.3
|style=white-space:nowrap|
* Dict word: "zaacr"
* User input: "zacar"
* Old prefix: "za"
* User input suffix: "car"
* New prefix: "zaa"
* Next-iteration user input suffix: "cr"
|-
|-
| zart || 27
| Substitution-Any
||
(This is the "regular" Substitution GrowOp as explained in the [[#Big Picture & Terminology]] section.)
||
Yes
||
<code>SUBSTITUTION_MULTIPLIER</code><br />
= 0.2
|style=white-space:nowrap|
(QWERTY KEYBOARD)
* Dict word: "zaxar"
* User input "zapar"
* Old prefix: "za"
* User input suffix: "par"
* New prefix: "zax"
* Next-iteration user input suffix: "ar"
|-
|-
| zalarm || 17
| Substitution-Variant
||
This is a special Substitution case: The last char of the potential new prefix is a diacritic form of the user input's first char, and/or they differ in capitalization. The diacritic part is unidirectional: we perform this GrowOp if the dict word char is a diacritic and the user input char is "plain", but not if the dict word char is "plain" and the user input char is a diacritic.
||
No
||
<code>VARIANT_FORM_MULTIPLIER</code><br />
= 0.99
|style=white-space:nowrap|
* Dict word: "zaçar"
* User input "zacar"
* Old prefix: "za"
* User input suffix: "car"
* New prefix: "zaç"
* Next-iteration user input suffix: "ar"
|-
|-
| zset || 12
| Substitution-Near
||
This is a special Substitution case where the last char of the potential new prefix is near the user input's first char as dictated by <code>nearbyKeys</code>.
||
Yes
||
Whichever is larger:<br />
Nearness<br />
or<br />
<code>SUBSTITUTION_MULTIPLIER</code><br />
= 0.2
|style=white-space:nowrap|
(QWERTY KEYBOARD)
* Dict word: "zaxar"
* User input "zacar"
* Old prefix: "za"
* User input suffix: "car"
* New prefix: "zax"
* Next-iteration user input suffix: "ar"
|}
|}


The TST will be serialized as followed (after the character table):
==== Nearness ====
The <code>nearByKeys</code> structure is generated by [https://github.com/mozilla-b2g/gaia/blob/master/apps/keyboard/js/imes/latin/latin.js latin.js] at <code>nearbyKeys()</code>. The following example list records nearby keys to 'g', along with their nearness, in QWERTY EN-US layout, obtained from FxOS running on [https://developer.mozilla.org/en-US/Firefox_OS/Phone_guide/Flame Flame].
f: 0.701416015625
h: 0.701416015625
v: 0.49862291921977125
t: 0.423379813234514
y: 0.423379813234514
b: 0.29144229503160113
c: 0.29144229503160113
r: 0.19181633764327974
u: 0.19181633764327974
d: 0.17535400390625
j: 0.17535400390625
Note that in the actual codes, the concept is usually written as "distance", but we're actually talking about the degree of closeness since the value is directly used as a multiplier when doing the Subsitution GrowOp.


(Again, note that offset 0 denotes the beginning of the tree, and is
==== Step-by-Step Examples ====
byte [15 + 11*6] from the beginning of the whole blob. 11 because we have 11
In this section, we give a few prediction examples for a couple user input words, against a tiny TST tree. This includes candidates' growth iterations and the status of the candidates/suggestions queue. Since our word list is very short, we limit to our candidate queue's capacity to 10 and suggestion queue to 2, in order to demonstrate the idea of bounded priority queue and those early return scenarios. We allow a max of 1 correction.
alphabets)


These illustrative examples do not include all GrowOperation instances.
We'll be operating on a regular QWERTY keyboard; Distance information is retrieved from FxOS running on Flame.
And here is our list of words and frequency, directly excerpted form our en_us wordlist:
{| class="wikitable"
{| class="wikitable"
|-
|-
! Node !! Offset !! Byte(s) !! Description
! Word !! Frequency !! Normalized Frequency
|-
|-
|rowspan="2"| Level 1<br>'z' || 0 || 0x9f<br>= 0b10011111
| Alps || 97 || 13
|-
| aplenty || 55 || 7
|-
| apple || 107 || 14
|-
| apply || 127 || 16
|-
| ogre || 77 || 10
|-
| oral || 119 || 15
|-
| organic || 123 || 16
|}
Which gives this TST:
[a      ->      o      ->      A]
  |                |                |
[p]              [r    ->    g]  [l]
  |                |          |    |
[p    ->    l]  [g -> a]  [r]  [p]
  |          |    |    |    |    |
[l]        [e]  [a]  [l]  [e]  [s]
  |          |    |
[y -> e]  [n]  [n]
            |    |
            [t]  [i]
            |    |
            [y]  [c]
 
===== Example user-input ''"apple"'' =====
 
A set of rowspan'ed rows denotes one growth iteration of a specific old candidate, possibly into more-than-one new candidates. We artificially ID candidates for the sake of clarity for this documentation. A newly added candidate is in bold face in the Candidate Queue column.
 
The table is long, so the footnotes are put here:
 
[†] 16 is the frequency of the most-frequent word that we can traverse (downwards) from this [a], corresponding to "apply".
 
[‡] Nearness is used as multiplier.
 
[*] EOW denotes a terminal node. It's omitted from the tree above, but provides illustrative assistance to determine the end of a dictionary word in our steps.
 
{| class="wikitable"
|-
!colspan="2"| Old<br />Candidate !!colspan="5"| Potential New Candidate !!rowspan="2"| Explanation !!rowspan="2" style="white-space:nowrap"| Candidates Queue<br />(node, output, rem, cor, multi, weight) !!rowspan="2"| Suggestions Queue
|-
! ID !! TST Node<br />and<br /><span style="white-space:nowrap">center node</span> !!<code>output</code> !! <code>remaining</code> !! Cumulative<br /><code>corrections</code> !! <code>multiplier</code> !! <code>word<br />frequency</code>
|-
||  ||  || "" || "apple" || 0 || 1 || 1 || The first candidate by <code>getSuggestions()</code> || '''C1: ([a], "", "apple", 0, 1, 1)''' ||
|-
|rowspan="6"| C1 ||rowspan="2"|
[a]
  &#124;
[p]
|| "a" || "pple" || 0 || 1 || 16 [†] || Match with a weight boost at <code>addCandidate()</code> || '''C2: ([p], "a", "pple", 0, 1, 16 + 100)''' ||
|-
|| "a" || "apple" || 1 || 0.3 || 16 || Insertion-Any ||
C2: ([p], "a", "pple", 0, 1, 116)<br />
'''C3: ([p], "a", "apple", 1, 0.3, 16 * 0.3)'''
||
|-
|rowspan="2"|
[o]
  &#124;
[r]
|| "o" || "pple" || 1 || 0.2 || 16 || Followed next from [a] to [o]; Substitute-Any ||
C2: ([p], "a", "pple", 0, 1, 116)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
'''C4: ([r], "o", "pple", 1, 0.2, 16 * 0.2)'''
||
|-
|| "o" || "apple" || 1 || 0.3 || 16 || Insertion-Any ||
C2: ([p], "a", "pple", 0, 1, 116)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
'''C5: ([r], "o", "apple", 1, 0.3, 16 * 0.3)'''<br />
C4: ([r], "o", "pple", 1, 0.2, 3.2)
||
|-
|rowspan="2"|
[A]
  &#124;
[l]
|| "A" || "pple" || 0 || 0.99 || 13 || Followed next from [o] to [A]; Substitution-Variant. Weight boost at <code>addCandidate()</code> ||
C2: ([p], "a", "pple", 0, 1, 116)<br />
'''C6: ([l], "A", "pple", 0, 0.99, 13 * 0.99 + 100)'''<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C4: ([r], "o", "pple", 1, 0.2, 3.2)
||
|-
|| "A" || "apple" || 1 || 0.3 || 13 ||  Insertion-Any ||
C2: ([p], "a", "pple", 0, 1, 116)<br />
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
'''C7: ([l], "A", "apple", 1, 0.3, 13 * 0.3)'''<br />
C4: ([r], "o", "pple", 1, 0.2, 3.2)
||
|-
|rowspan="4"| C2 ||rowspan="4"|
[p]
  &#124;
[p]
|| "ap" || "ple" || 0 || 1 || 16 || Match with a weight boost at <code>addCandidate()</code> ||
'''C8: ([p], "ap", "ple", 0, 1, 16 + 100)'''<br />
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C7: ([l], "A", "apple", 1, 0.3, 3.9)<br />
C4: ([r], "o", "pple", 1, 0.2, 3.2)
||
|-
|| "ap" || "pple" || 1 || 0.3 || 16 || Insertion-Any ||
C8: ([p], "ap", "ple", 0, 1, 116)<br />
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
'''C9: ([p], "ap", "pple", 1, 0.3, 16 * 0.3)'''<br />
C7: ([l], "A", "apple", 1, 0.3, 3.9)<br />
C4: ([r], "o", "pple", 1, 0.2, 3.2)
||
|-
|| "ap" || "ple" || 1 || 0.3 || 16 || Transposition of this [p]'s center child, also a [p] ||
C8: ([p], "ap", "ple", 0, 1, 116)<br />
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
'''C10: ([p], "ap", "ple", 1, 0.3, 16 * 0.3)'''<br />
C7: ([l], "A", "apple", 1, 0.3, 3.9)<br />
C4: ([r], "o", "pple", 1, 0.2, 3.2)
||
||
* This node has character 'z', a ASCII-7 one-byte
* No "next" pointer
* The most-frequent word that can be traversed from this node has freq = 31, of 'zila'
|-
|-
| 1 || 0x7a<br>= chr('z')
|| "ap" || "le" || 1 || 0.1 || 16 || Deletion of this [p], and go to this [p]'s center child, also a [p] ||
C8: ([p], "ap", "ple", 0, 1, 116)<br />
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C7: ([l], "A", "apple", 1, 0.3, 3.9)<br />
C4: ([r], "o", "pple", 1, 0.2, 3.2)<br />
'''C11: ([p], "ap", "le", 1, 0.1, 16 * 0.1)'''
||
||
|-
|-
|rowspan="3"| Level 2<br>'i' || 2 || 0xbf<br>= 0b10011111
|rowspan="6"| C8 ||rowspan="2"|
[p]
  &#124;
[l]
|| "app" || "le" || 0 || 1 || 16 || Match with a weight boost at <code>addCandidate()</code> ||
'''C12: ([l], "app", "le", 0, 1, 16 + 100)'''<br />
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C7: ([l], "A", "apple", 1, 0.3, 3.9)<br />
C4: ([r], "o", "pple", 1, 0.2, 3.2)<br />
C11: ([p], "ap", "le", 1, 0.1, 1.6)
||
||
* This node has character 'i' of the 2nd level, as the center node from 'z' above
* There is a "next" pointer, to 'a'
* The most-frequent word that can be traversed from this node has freq = 31, of 'zila'
|-
|-
| 3 || 0x69<br>= chr('i')
|| "app" || "ple" || 1 || 0.3 || 16 || Insertion-Any<br />At this moment, our candidate queue is full and threshold is 1.6. ||
C12: ([l], "app", "le", 0, 1, 116)<br />
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
'''C13: ([l], "app", "ple", 1, 0.3, 16 * 0.3)'''<br />
C7: ([l], "A", "apple", 1, 0.3, 3.9)<br />
C4: ([r], "o", "pple", 1, 0.2, 3.2)<br />
C11: ([p], "ap", "le", 1, 0.1, 1.6)
||
||
|-
|-
| 4 5 6 || 0x00 0x00 0x1c<br>= hex(28)
|rowspan="4"|
[l]
  &#124;
[e]
|| "apl" || "le" || 1 || 0.423 [‡] || 7 || Followed next from [p] to [l]; Substitute-Near<br />This bumps off C11. ||
C12: ([l], "app", "le", 0, 1, 116)<br />
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C7: ([l], "A", "apple", 1, 0.3, 3.9)<br />
C4: ([r], "o", "pple", 1, 0.2, 3.2)<br />
'''C14: ([e], "apl", "le", 1, 0.42, 7 * 0.423)'''
||
||
* The position of the 'a' node, next-pointed from 'i'
|-
|-
|rowspan="2"| Level 3<br>'l' || 7 || 0x9f<br>= 0b10011111
|| "apl" || "ple" || 1 || 0.3 || 7 || Try performing Insertion-Any;<br/>since this results in too low a weight, it's not added. ||  
C12: ([l], "app", "le", 0, 1, 116)<br />
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C7: ([l], "A", "apple", 1, 0.3, 3.9)<br />
C4: ([r], "o", "pple", 1, 0.2, 3.2)<br />
C14: ([e], "apl", "le", 1, 0.42, 2.96)
||
||
* This node has character 'l' of the 3rd level, as the center node from 'i' above
* No "next" pointer
* The most-frequent word that can be traversed from this node has freq = 31, of 'zila'
|-
|-
| 8 || 0x6c<br>= chr('l')
|| "apl" || "pe" || 1 || 0.3 || 7 || Try performing Transposition;<br/>Since this results in too low a weight, it's not added. ||  
C12: ([l], "app", "le", 0, 1, 116)<br />
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C7: ([l], "A", "apple", 1, 0.3, 3.9)<br />
C4: ([r], "o", "pple", 1, 0.2, 3.2)<br />
C14: ([e], "apl", "le", 1, 0.42, 2.96)
||
||
|-
|-
|rowspan="3"| Level 4<br>'a' || 9 || 0xbf<br>= 0b10011111
|| "apl" || "e" || 1 || 0.1 || 7 || Try performing Deletion;<br/>Since this results in too low a weight, it's not added. ||  
C12: ([l], "app", "le", 0, 1, 116)<br />
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C7: ([l], "A", "apple", 1, 0.3, 3.9)<br />
C4: ([r], "o", "pple", 1, 0.2, 3.2)<br />
C14: ([e], "apl", "le", 1, 0.42, 2.96)
||
||
* This node has character 'a' of the 4th level, as the center node from 'l' above
* There is a "next" pointer, to 'i'
* The most-frequent word that can be traversed from this node has freq = 31, of 'zila'
|-
|-
| 10 || 0x61<br>= chr('a')
|rowspan="2"| C12 ||rowspan="2"|
[l]
  &#124;
[y]
|| "appl" || "e" || 0 || 1 || 16 || Match with a weight boost at <code>addCandidate()</code> ||
'''C15: ([y], "appl", "e", 0, 1, 16 + 100)'''<br />
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C7: ([l], "A", "apple", 1, 0.3, 3.9)<br />
C4: ([r], "o", "pple", 1, 0.2, 3.2)<br />
C14: ([e], "apl", "le", 1, 0.42, 2.96)
||
||
|-
|-
| 11 12 13 || 0x00 0x00 0x0f<br>= hex(15)
|| "appl" || "le" || 1 || 0.3 || 16 || Insertion-Any<br />This bumps C14 off. ||
C15: ([y], "appl", "e", 0, 1, 16 + 100)<br />
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
'''C16: ([y], "appl", "le", 1, 0.3, 16 * 0.3)'''<br />
C7: ([l], "A", "apple", 1, 0.3, 3.9)<br />
C4: ([r], "o", "pple", 1, 0.2, 3.2)
||
||
* The position of the 'i' node, next-pointed from 'a'
|-
|-
| terminal || 14 || 0x1f<br>= 0b00011111
|rowspan="4"| C15 ||rowspan="2"|
[y]
  &#124;
EOW
[*]
|| "apply" || "" || 1 || 0.2 || 16 || Substitution-Any ||
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C16: ([y], "appl", "le", 1, 0.3, 4.8)<br />
C7: ([l], "A", "apple", 1, 0.3, 3.9)<br />
C4: ([r], "o", "pple", 1, 0.2, 3.2)<br />
'''C17: (EOW, "apply", "", 1, 0.2, 16 * 0.2)'''
||
||
* This is the terminal node for the 'zila' word. Frequency of 'zila' is 31
* No "next" pointer
|-
|-
|rowspan="2"| Level 4<br>'i' || 15 || 0x96<br>= 0b10010110
|| "apply" || "e" || 1 || 0.3 || 16 || Insertion-Any<br />This bumps either C4 or C17 off, say we bump C4 here. ||
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C16: ([y], "appl", "le", 1, 0.3, 4.8)<br />
'''C18: (EOW, "apply", "e", 1, 0.3, 16 * 0.3)'''<br />
C7: ([l], "A", "apple", 1, 0.3, 3.9)<br />
C17: (EOW, "apply", "", 1, 0.2, 3.2)
||
||
* This is the 'i' node next-pointed from 'a', in the 4th level
* No 'next' pointer
* The most-frequent word that can be traversed from this node has freq = 22, of 'zilian'
|-
|-
| 16 || 0x69<br>= chr('i')
|rowspan="2"|
[e]
  &#124;
EOW
|| "apple" || "" || 0 || 1 || 14 || Followed next from [y] to [e], Match with a weight boost at <code>addCandidate()</code><br />This bumps C17 off. ||
'''C19: (EOW, "apple", "", 0, 1, 14 + 100)'''<br />
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C16: ([y], "appl", "le", 1, 0.3, 4.8)<br />
C18: (EOW, "apply", "e", 1, 0.3, 4.8)<br />
C7: ([l], "A", "apple", 1, 0.3, 3.9)
||
||
|-
|-
|rowspan="2"| Level 5<br>'a' || 17 || 0x96<br>= 0b10010110
|| "apple" || "e" || 1 || 0.3 || 14 || Insertion-Any<br />This bumps C7 off. ||
C19: (EOW, "apple", "", 0, 1, 114)<br />
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C16: ([y], "appl", "le", 1, 0.3, 4.8)<br />
C17: (EOW, "apply", "e", 1, 0.3, 4.8)<br />
'''C20: (EOW, "apple", "e", 1, 0.3, 4.2)'''
||
||
* This node has character 'a' of the 5th level, as the center node from 'i' above
* No 'next' pointer
* The most-frequent word that can be traversed from this node has freq = 22, of 'zilian'
|-
|-
| 18 || 0x61<br>= chr('a')
|| C19 ||
  EOW
|| "apple" || "" ||  ||  ||  || This is a grown suggestion. ||
C6: ([l], "A", "pple", 0, 0.99, 112.87)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C16: ([y], "appl", "le", 1, 0.3, 4.8)<br />
C17: (EOW, "apply", "e", 1, 0.3, 4.8)<br />
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
|style="white-space:nowrap"|
'''C19: (apple, 14)'''
|-
|rowspan="2"| C6 ||rowspan="2"|
[l]
  &#124;
[e]
|| "Al" || "ple" || 1 || 0.423 || 13 || Substitution-Near ||style="white-space:nowrap"|
'''C21: ([e], "Al", "ple", 1, 0.99 * 0.423, 13 * 0.99 * 0.423)'''<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C16: ([y], "appl", "le", 1, 0.3, 4.8)<br />
C17: (EOW, "apply", "e", 1, 0.3, 4.8)<br />
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
||
||
C19: (apple, 14)
|-
|-
|rowspan="3"| Level 6<br>'n' || 19 || 0xb6<br>= 0b10110110
|| "Al" || "pple" || 1 || 0.3 || 13 || Insertion-Any ||
C21: ([e], "Al", "ple", 1, 5.44)<br />
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C16: ([y], "appl", "le", 1, 0.3, 4.8)<br />
C17: (EOW, "apply", "e", 1, 0.3, 4.8)<br />
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
'''C22: ([e], "Al", "pple", 1, 0.3, 13 * 0.3)'''
||
||
* This node has character 'n' of the 6th level, as the center node from 'a' above
C19: (apple, 14)
* There is a "next" pointer, to 'c'
* The most-frequent word that can be traversed from this node has freq = 22, of 'zilian'
|-
|-
| 20 || 0x6e<br>= chr('n')
|| C21 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C3: ([p], "a", "apple", 1, 0.3, 4.8)<br />
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C16: ([y], "appl", "le", 1, 0.3, 4.8)<br />
C17: (EOW, "apply", "e", 1, 0.3, 4.8)<br />
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
||
C19: (apple, 14)
|-
|-
| 21 22 23 || 0x00 0x00 0x19<br>= hex(25)
|| C3 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C5: ([r], "o", "apple", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C16: ([y], "appl", "le", 1, 0.3, 4.8)<br />
C17: (EOW, "apply", "e", 1, 0.3, 4.8)<br />
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
||
* The position of the 'c' node, next-pointed from 'n'
C19: (apple, 14)
|-
|-
| terminal || 24 || 0x16<br>= 0b00010110
|| C5 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C9: ([p], "ap", "pple", 1, 0.3, 4.8)<br />
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C16: ([y], "appl", "le", 1, 0.3, 4.8)<br />
C17: (EOW, "apply", "e", 1, 0.3, 4.8)<br />
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
||
* This is the terminal node for the 'zilian' word. Frequency of 'zilian' is 22
C19: (apple, 14)
* No "next" pointer
|-
|-
|rowspan="2"| Level 6<br>'c' || 25 || 0x87<br>= 0b1000111
|rowspan="2"| C9 ||rowspan="2"|
[p]
  &#124;
[l]
|| "app" || "ple" || 1 || 1 || 16 || Match ||
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C16: ([y], "appl", "le", 1, 0.3, 4.8)<br />
C17: (EOW, "apply", "e", 1, 0.3, 4.8)<br />
'''C23: ([l], "app", "ple", 1, 0.3 * 1, 16 * 0.3 * 1)'''<br/>
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
||
* This is the 'c' node next-pointed from 'n', in the 6th level
C19: (apple, 14)
* No 'next' pointer
* The most-frequent word that can be traversed from this node has freq = 7, of 'ziliac'
|-
|-
| 26 || 0x63<br>= chr('c')
|colspan="6"| Other GrowOps will violate <code>maxCorrections</code> constraint. ||
C10: ([p], "ap", "ple", 1, 0.3, 4.8)<br />
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C16: ([y], "appl", "le", 1, 0.3, 4.8)<br />
C17: (EOW, "apply", "e", 1, 0.3, 4.8)<br />
C23: ([l], "app", "ple", 1, 0.3, 4.8)<br/>
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
||
C19: (apple, 14)
|-
|-
| terminal || 27 || 0x07<br>= 0b00000111
|rowspan="2"| C10 ||rowspan="2"|
[p]
  &#124;
[l]
|| "app" || "le" || 1 || 1 || 16 || Match ||
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C16: ([y], "appl", "le", 1, 0.3, 4.8)<br />
C17: (EOW, "apply", "e", 1, 0.3, 4.8)<br />
C23: ([l], "app", "ple", 1, 0.3, 4.8)<br/>
'''C24: ([l], "app", "le", 1, 0.3 * 1, 16 * 0.3 * 1)'''<br />
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
||
* This is the terminal node for the 'ziliac' word. Frequency of 'zilian' is 7
C19: (apple, 14)
* No "next" pointer
|-
|-
|rowspan="3"| Level 2<br>'a' || 28 || 0xbb<br>= 0b10111011
|colspan="6"| Other GrowOps will violate <code>maxCorrections</code> constraint. ||
C13: ([l], "app", "ple", 1, 0.3, 4.8)<br />
C16: ([y], "appl", "le", 1, 0.3, 4.8)<br />
C17: (EOW, "apply", "e", 1, 0.3, 4.8)<br />
C23: ([l], "app", "ple", 1, 0.3, 4.8)<br/>
C24: ([l], "app", "le", 1, 0.3, 4.8)<br />
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
||
* This is the 'a' node next-pointed from 'i', in the 2nd level
C19: (apple, 14)
* There is a "next" pointer, to 's'
* The most-frequent word that can be traversed from this node has freq = 27, of 'zart'
|-
|-
|colspan=3| (and we encode character 'a')
|| C13 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint ||
C16: ([y], "appl", "le", 1, 0.3, 4.8)<br />
C17: (EOW, "apply", "e", 1, 0.3, 4.8)<br />
C23: ([l], "app", "ple", 1, 0.3, 4.8)<br/>
C24: ([l], "app", "le", 1, 0.3, 4.8)<br />
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
C19: (apple, 14)
|-
|-
|colspan=3| (and the position of that "next" 's' node)
|| C16 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint ||
C17: (EOW, "apply", "e", 1, 0.3, 4.8)<br />
C23: ([l], "app", "ple", 1, 0.3, 4.8)<br/>
C24: ([l], "app", "le", 1, 0.3, 4.8)<br />
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
C19: (apple, 14)
|-
|| C17 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint ||
C23: ([l], "app", "ple", 1, 0.3, 4.8)<br/>
C24: ([l], "app", "le", 1, 0.3, 4.8)<br />
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
C19: (apple, 14)
|-
|| C23 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint ||
C24: ([l], "app", "le", 1, 0.3, 4.8)<br />
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
C19: (apple, 14)
|-
|rowspan="2"| C24 ||rowspan="2"|
[l]
  &#124;
[e]
|| "appl" || "e" || 1 || 1 || 16 || Match ||
'''C25: ([e], "appl", "e", 1, 0.3 * 1, 16 * 0.3 * 1)'''<br />
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
C19: (apple, 14)
|-
|colspan="6"| Other GrowOps will violate <code>maxCorrections</code> constraint. ||
C25: ([e], "appl", "e", 1, 0.3, 4.8)<br />
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
C19: (apple, 14)
|-
|rowspan="2"| C25 ||rowspan="2"|
[e]
  &#124;
EOW
|| "apple" || "" || 1 || 1 || 16 || Match ||
'''C26: (EOW, "apple", "", 1, 0.3 * 1, 16 * 0.3 * 1)'''<br />
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
C19: (apple, 14)
|-
|colspan="6"| Other GrowOps will violate <code>maxCorrections</code> constraint. ||
C26: ([e], "appl", "e", 1, 0.3, 4.8)<br />
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
C19: (apple, 14)
|-
|| C26 ||
  EOW
|| "apple" || "" ||  ||  ||  || Grown candidate, but the word is already in Suggestion Queue with a larger weight, so do nothing. ||
C20: (EOW, "apple", "e", 1, 0.3, 4.2)<br />
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
C19: (apple, 14)
|-
|| C20 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C22: ([e], "Al", "pple", 1, 0.3, 3.9)
||
C19: (apple, 14)
|-
|| C22 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
||
C19: (apple, 14)
|-
|-
|colspan=4| (...and the rest of the TST. As we serialize the nodes depth-first, the following nodes would be 'r', 't', 'l', 'a', 'r', 'm', 's', 'e', 't')
|}
|}
So we come up with a suggestion: apple.


==== Notes on Terminal Node ====
===== Example user-input ''"orfanic"'' =====


A a terminal node (with c bit = 0) can still have a next pointer.
{| class="wikitable"
|-
!colspan="2"| Old<br />Candidate !!colspan="5"| Potential New Candidate !!rowspan="2"| Explanation !!rowspan="2" style="white-space:nowrap"| Candidates Queue<br />(node, output, rem, cor, multi, weight) !!rowspan="2"| Suggestions Queue
|-
! ID !! TST Node<br />and<br /><span style="white-space:nowrap">center node</span> !!<code>output</code> !! <code>remaining</code> !! Cumulative<br /><code>corrections</code> !! <code>multiplier</code> !! <code>word<br />frequency</code>
|-
||  ||  || "" || "orfanic" || 0 || 1 || 1 || The first candidate by <code>getSuggestions()</code> || '''C1: ([a], "", "orfanic", 0, 1, 1)''' ||
|-
|rowspan="6"| C1 ||rowspan="2"|
[a]
  &#124;
[p]
|| "a" || "rfanic" || 1 || 0.2 || 16 || Substitution-Any || '''C2: ([p], "a", "rfanic", 1, 0.2, 16 * 0.2)''' ||
|-
|| "a" || "orfanic" || 1 || 0.3 || 16 || Insertion-Any ||
'''C3: ([p], "a", "orfanic", 1, 0.3, 16 * 0.3)'''<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
||
|-
|rowspan="2"|
[o]
  &#124;
[r]
|| "o" || "rfanic" || 0 || 1 || 16 || Followed next from [a] to [o]; Match with a weight boost at <code>addCandidate()</code> ||
'''C4: ([r], "o", "rfanic", 0, 1, 16 + 100)'''<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
||
|-
|| "o" || "orfanic" || 1 || 0.3 || 16 || Insertion-Any ||
C4: ([r], "o", "rfanic", 0, 1, 116)<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
'''C5: ([r], "o", "orfanic", 1, 0.3, 16 * 0.3)'''<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
||
|-
|rowspan="2"|
[A]
  &#124;
[l]
|| "A" || "rfanic" || 1 || 0.2 || 13 || Followed next from [o] to [A]; Substitution-Any ||
C4: ([r], "o", "rfanic", 0, 1, 116)<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
'''C6: ([l], "A", "rfanic", 1, 0.2, 13 * 0.2)'''
||
|-
|| "A" || "orfanic" || 1 || 0.3 || 13 || Insertion-Any ||
C4: ([r], "o", "rfanic", 0, 1, 116)<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
'''C7: ([l], "A", "orfanic", 1, 0.3, 13 * 0.3)'''<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)
||
|-
|rowspan="4"| C4 ||rowspan="2"|
[r]
  &#124;
[g]
|| "or" || "fanic" || 0 || 1 || 16 || Match ||
'''C8: ([g], "or", "fanic", 0, 1, 16 + 100)'''<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)
||
|-
|| "or" || "rfanic" || 1 || 0.3 || 16 || Insertion-Any ||
C8: ([g], "or", "fanic", 0, 1, 116)<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
'''C9: ([g], "or", "rfanic", 1, 0.3, 16 * 0.3)'''<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)
||
|-
|rowspan="2"|
[g]
  &#124;
[r]
|| "og" || "fanic" || 1 || 0.2 || 10 || Followed next from [r] to [g]; Substitution-Near, but g-r distance is 0.1918, which is less than 0.2, so we use 0.2 as multiplier. ||
C8: ([g], "or", "fanic", 0, 1, 116)<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)<br />
'''C10: ([r], "og", "fanic", 1, 0.2, 10 * 0.2)'''
||
|-
|| "og" || "rfanic" || 1 || 0.3 || 10 || Insertion-Any ||
C8: ([g], "or", "fanic", 0, 1, 116)<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
'''C11: ([r], "og", "rfanic", 1, 0.3, 10 * 0.3)'''<br />
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)<br />
C10: ([r], "og", "fanic", 1, 0.2, 2.0)
||
|-
|rowspan="6"| C8 ||rowspan="2"|
[g]
  &#124;
[a]
|| "org" || "anic" || 1 || 0.7014 || 16 || Substitution-Near ||
'''C12: ([a], "org", "anic", 1, 0.7014, 16 * 0.7014)'''<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)<br />
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)<br />
C10: ([r], "og", "fanic", 1, 0.2, 2.0)
||
|-
|| "org" || "fanic" || 1 || 0.3 || 16 || Insertion-Any; Queue is now full, threshold = 2.0. ||
C12: ([a], "org", "anic", 1, 0.7014, 11.22)<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
'''C13: ([r], "org", "fanic", 1, 0.3, 16 * 0.3)'''<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)<br />
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)<br />
C10: ([r], "og", "fanic", 1, 0.2, 2.0)
||
|-
|rowspan="4"|
[a]
  &#124;
[l]
|| "ora" || "anic" || 1 || 0.2 || 15 || Followed next from [g] to [a]; Substitution-Any<br />This bumps C10 off. ||
C12: ([a], "org", "anic", 1, 0.7014, 11.22)<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C13: ([r], "org", "fanic", 1, 0.3, 4.8)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)<br />
'''C14: ([l], "ora", "anic", 1, 0.2, 15 * 0.2)'''<br />
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)
||
|-
|| "ora" || "fanic" || 1 || 0.3 || 15 || Insertion-Any; C6 is bumped off. ||
C12: ([a], "org", "anic", 1, 0.7014, 11.22)<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C13: ([r], "org", "fanic", 1, 0.3, 4.8)<br />
'''C15: ([l], "ora", "fanic", 1, 0.3, 15 * 0.3)'''<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)<br />
C14: ([l], "ora", "anic", 1, 0.2, 3.0)
||
|-
|| "ora" || "fnic" || 1 || 0.3 || 15 || Transposition; C14 is bumped off. ||
C12: ([a], "org", "anic", 1, 0.7014, 11.22)<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C13: ([r], "org", "fanic", 1, 0.3, 4.8)<br />
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)<br />
'''C16: ([l], "ora", "fnic", 1, 0.3, 15 * 0.3)'''<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
|-
|| "ora" || "nic" || 1 || 0.1 || 15 || Try performing Deletion;<br />since this results in too low a weight, it's not added. ||
C12: ([a], "org", "anic", 1, 0.7014, 11.22)<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C13: ([r], "org", "fanic", 1, 0.3, 4.8)<br />
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)<br />
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
|-
|rowspan="2"| C12 ||rowspan="2"|
[a]
  &#124;
[n]
|| "orga" || "nic" || 1 || 1 || 16 || Match ||style="white-space:nowrap"|
'''C17: ([n], "orga", "nic", 1, 0.7014 * 1, 16 * 0.7014 * 1)'''<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C13: ([r], "org", "fanic", 1, 0.3, 4.8)<br />
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)<br />
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
|-
|colspan="6"| Other GrowOps will violate <code>maxCorrections</code> constraint. ||
C17: ([n], "orga", "nic", 1, 0.7014, 11.22)<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C13: ([r], "org", "fanic", 1, 0.3, 4.8)<br />
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)<br />
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
|-
|rowspan="2"| C17 ||rowspan="2"|
[n]
  &#124;
[i]
|| "organ" || "ic" || 1 || 1 || 16 || Match ||style="white-space:nowrap"|
'''C18: ([i], "organ", "ic", 1, 0.7014 * 1, 16 * 0.7014 * 1)'''<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C13: ([r], "org", "fanic", 1, 0.3, 4.8)<br />
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)<br />
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
|-
|colspan="6"| Other GrowOps will violate <code>maxCorrections</code> constraint. ||
C18: ([i], "organ", "ic", 1, 0.7014, 11.22)<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C13: ([r], "org", "fanic", 1, 0.3, 4.8)<br />
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)<br />
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
|-
|rowspan="2"| C18 ||rowspan="2"|
[i]
  &#124;
[c]
|| "organi" || "c" || 1 || 1 || 16 || Match ||style="white-space:nowrap"|
'''C19: ([c], "organi", "c", 1, 0.7014 * 1, 16 * 0.7014 * 1)'''<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C13: ([r], "org", "fanic", 1, 0.3, 4.8)<br />
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)<br />
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
|-
|colspan="6"| Other GrowOps will violate <code>maxCorrections</code> constraint. ||
C19: ([c], "organi", "c", 1, 0.7014, 11.22)<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C13: ([r], "org", "fanic", 1, 0.3, 4.8)<br />
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)<br />
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
|-
|rowspan="2"| C19 ||rowspan="2"|
[c]
  &#124;
EOW
|| "organic" || "" || 1 || 1 || 16 || Match ||style="white-space:nowrap"|
'''C20: (EOW, "organic", "", 1, 0.7014 * 1, 16 * 0.7014 * 1)'''<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C13: ([r], "org", "fanic", 1, 0.3, 4.8)<br />
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)<br />
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
|-
|colspan="6"| Other GrowOps will violate <code>maxCorrections</code> constraint. ||
C20: (EOW, "organic", "", 1, 0.7014, 11.22)<br />
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C13: ([r], "org", "fanic", 1, 0.3, 4.8)<br />
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)<br />
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
|-
|| C20 ||
  EOW
|| "organic" || "" ||  ||  ||  || This is a grown suggestion. ||
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)<br />
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C13: ([r], "org", "fanic", 1, 0.3, 4.8)<br />
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)<br />
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
|style="white-space:nowrap"|
'''C20: (organic, 11.22)'''
|-
|| C3 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)<br />
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C13: ([r], "org", "fanic", 1, 0.3, 4.8)<br />
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)<br />
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
C20: (organic, 11.22)
|-
|| C5 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)<br />
C13: ([r], "org", "fanic", 1, 0.3, 4.8)<br />
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)<br />
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
C20: (organic, 11.22)
|-
|| C9 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C13: ([r], "org", "fanic", 1, 0.3, 4.8)<br />
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)<br />
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
C20: (organic, 11.22)
|-
|| C13 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)<br />
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
C20: (organic, 11.22)
|-
|| C15 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)<br />
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
C20: (organic, 11.22)
|-
|| C16 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)<br />
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
C20: (organic, 11.22)
|-
|| C7 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)<br />
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
C20: (organic, 11.22)
|-
|| C2 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
||
C20: (organic, 11.22)
|-
|rowspan="2"| C11 ||rowspan="2"|
[r]
  &#124;
[e]
|| "ogr" || "fanic" || 1 || 1 || 10 || Match ||style="white-space:nowrap"|
'''C21: ([e], "ogr", "fanic", 1, 0.3 * 1, 10 * 0.3 * 1)'''<br />
||
C20: (organic, 11.22)
|-
|colspan="6"| Other GrowOps will violate <code>maxCorrections</code> constraint. ||
C21: ([e], "ogr", "fanic", 1, 0.3, 3.0)
||
C20: (organic, 11.22)
|-
|| C21 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
||
C20: (organic, 11.22)
|-
|}
So we come up with a suggestion: organic.
 
===== Example user-input ''"aplen"'' =====
[*] The over-simplification is untruthy -- the actual codes do not drop duplicated candidates. Our over-simplification there would matter if the queue ever becomes full again, but as it won't, it's okay for the purposes to simplify the steps here.


For example, consider the word list:
{| class="wikitable"
{| class="wikitable"
|-
|-
! Word !! Frequency
!colspan="2"| Old<br />Candidate !!colspan="5"| Potential New Candidate !!rowspan="2"| Explanation !!rowspan="2" style="white-space:nowrap"| Candidates Queue<br />(node, output, rem, cor, multi, weight) !!rowspan="2"| Suggestions Queue
|-
! ID !! TST Node<br />and<br /><span style="white-space:nowrap">center node</span> !!<code>output</code> !! <code>remaining</code> !! Cumulative<br /><code>corrections</code> !! <code>multiplier</code> !! <code>word<br />frequency</code>
|-
||  ||  || "" || "aplen" || 0 || 1 || 1 || The first candidate by <code>getSuggestions()</code> || '''C1: ([a], "", "aplen", 0, 1, 1)''' ||
|-
|rowspan="6"| C1 ||rowspan="2"|
[a]
  &#124;
[p]
|| "a" || "plen" || 0 || 1 || 16 || Match with a weight boost at <code>addCandidate()</code> || '''C2: ([p], "a", "plen", 0, 1, 16 + 100)''' ||
|-
|| "a" || "aplen" || 1 || 0.3 || 16 || Insertion-Any ||
C2: ([p], "a", "plen", 0, 1, 116)<br />
'''C3: ([p], "a", "aplen", 1, 0.3, 16 * 0.3)'''
||
|-
|rowspan="2"|
[o]
  &#124;
[r]
|| "o" || "plen" || 1 || 0.2 || 16 || Followed next from [a] to [o]; Substitute-Any ||
C2: ([p], "a", "plen", 0, 1, 116)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
'''C4: ([r], "o", "plen", 1, 0.2, 16 * 0.2)'''
||
|-
|| "o" || "aplen" || 1 || 0.3 || 16 || Insertion-Any ||
C2: ([p], "a", "plen", 0, 1, 116)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
'''C5: ([r], "o", "aplen", 1, 0.3, 16 * 0.3)'''<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
|-
|rowspan="2"|
[A]
  &#124;
[l]
|| "A" || "plen" || 0 || 0.99 || 13 || Followed next from [o] to [A]; Substitution-Variant. Weight boost at <code>addCandidate()</code> ||
C2: ([p], "a", "plen", 0, 1, 116)<br />
'''C6: ([l], "A", "plen", 0, 0.99, 13 * 0.99 + 100)'''<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
|-
|| "A" || "aplen" || 1 || 0.3 || 13 ||  Insertion-Any ||
C2: ([p], "a", "plen", 0, 1, 116)<br />
C6: ([l], "A", "plen", 0, 0.99, 112.87)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
'''C7: ([l], "A", "aplen", 1, 0.3, 13 * 0.3)'''<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
|-
|rowspan="2"| C2 ||rowspan="2"|
[p]
  &#124;
[p]
|| "ap" || "len" || 0 || 1 || 16 || Match with a weight boost at <code>addCandidate()</code> ||
'''C8: ([p], "ap", "len", 0, 1, 16 + 100)'''<br />
C6: ([l], "A", "plen", 0, 0.99, 112.87)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
|-
|| "ap" || "plen" || 1 || 0.3 || 16 || Insertion-Any ||
C8: ([p], "ap", "len", 0, 1, 116)<br />
C6: ([l], "A", "plen", 0, 0.99, 112.87)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
'''C9: ([p], "ap", "plen", 1, 0.3, 16 * 0.3)'''<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
|-
|rowspan="3"| C8 ||
[p]
  &#124;
[l]
|| "app" || "len" || 1 || 0.3 || 16 || Insertion-Any ||
C6: ([l], "A", "plen", 0, 0.99, 112.87)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
'''C10: ([l], "app", "len", 1, 0.3, 16 * 0.3)'''<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
|-
|rowspan="2"|
[l]
  &#124;
[e]
|| "apl" || "en" || 0 || 1 || 7 || Followed next from [p] to [l]; Match with a weight boost at <code>addCandidate()</code> ||
C6: ([l], "A", "plen", 0, 0.99, 112.87)<br />
'''C11: ([e], "apl", "en", 0, 1, 7 + 100)'''<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
|-
|| "apl" || "len" || 1 || 0.3 || 7 ||  Insertion-Any ||
C6: ([l], "A", "plen", 0, 0.99, 112.87)<br />
C11: ([e], "apl", "en", 0, 1, 107)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)<br />
'''C12: ([e], "apl", "len", 1, 0.3, 7 * 0.3)'''
||
|-
|rowspan="4"| C6 ||rowspan="4"|
[l]
  &#124;
[p]
|| "Al" || "len" || 1 || 0.423 || 10 || Substitution-Near ||
C11: ([e], "apl", "en", 0, 1, 107)<br />
'''C13: ([p], "Al", "len", 1, 0.423, 13 * 0.423)'''<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)<br />
C12: ([e], "apl", "len", 1, 0.3, 2.1)
||
|-
|| "Al" || "plen" || 1 || 0.3 || 10 || Insertion-Any<br />At this moment, our candidate queue is full and threshold is 2.1.||
C11: ([e], "apl", "en", 0, 1, 107)<br />
C13: ([p], "Al", "len", 1, 0.423, 5.5)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
'''C14: ([p], "Al", "plen", 1, 0.3, 13 * 0.3)'''<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)<br />
C12: ([e], "apl", "len", 1, 0.3, 2.1)
||
|-
|| "Al" || "pen" || 1 || 0.3 || 13 || Transposition<br />This bumps off C12. ||
C11: ([e], "apl", "en", 0, 1, 107)<br />
C13: ([p], "Al", "len", 1, 0.423, 5.5)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
'''C15: ([p], "Al", "pen", 1, 0.3, 3.9)'''<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
|-
|| "Al" || "en" || 1 || 0.1 || 13 || Try performing Deletion;<br/>since this results in too low a weight, it's not added. ||
C11: ([e], "apl", "en", 0, 1, 107)<br />
C13: ([p], "Al", "len", 1, 0.423, 5.5)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
|-
|rowspan="2"| C11 ||rowspan="2"|
[e]
  &#124;
[n]
|| "aple" || "n" || 0 || 1 || 7 || Match with a weight boost at <code>addCandidate()</code> ||
'''C16: ([n], "aple", "n", 0, 1, 7 + 100)'''<br />
C13: ([p], "Al", "len", 1, 0.423, 5.5)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
|-
|| "aple" || "en" || 1 || 0.3 || 7 || Try performing Insertion-Any;<br/>since this results in too low a weight, it's not added. ||
C16: ([n], "aple", "n", 0, 1, 107)<br />
C13: ([p], "Al", "len", 1, 0.423, 5.5)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
|-
|rowspan="2"| C16 ||rowspan="2"|
[n]
  &#124;
[t]
|| "aplen" || "" || 0 || 1 || 7 || Match with a weight boost at <code>addCandidate()</code> ||
'''C17: ([t], "aplen", "", 0, 1, 7 + 100)'''<br />
C13: ([p], "Al", "len", 1, 0.423, 5.5)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
|-
|| "aplen" || "n" || 1 || 0.3 || 7 || Try performing Insertion-Any;<br/>since this results in too low a weight, it's not added. ||
C17: ([t], "aplen", "", 0, 1, 107)<br />
C13: ([p], "Al", "len", 1, 0.423, 5.5)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
|-
|| C17 ||
[t]
  &#124;
[y]
|| "aplent" || "" || 0 || 0.4 || 7 || INsertion-Extend, with a weight boost at <code>addCandidate()</code> ||
'''C18: ([y], "aplent", "", 0, 0.4, 7 * 0.4 + 100)'''<br />
C13: ([p], "Al", "len", 1, 0.423, 5.5)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
|-
|| C18 ||
[y]
  &#124;
EOW
|| "aplenty" || "" || 0 || 0.4 || 7 || Insertion-Extend, with a smaller weight boost at <code>addCandidate()</code> ||style="white-space:nowrap"|
'''C19: (EOW, "aplenty", "", 0, 0.4 * 0.4, 7 * 0.4 * 0.4 + (7 / 32) * 10)'''<br />
C13: ([p], "Al", "len", 1, 0.423, 5.5)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
|-
|| C19 ||
  EOW
|| "aplenty" || "" ||  ||  ||  || This is a grown suggestion. ||
C13: ([p], "Al", "len", 1, 0.423, 5.5)<br />
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
|style="white-space:nowrap"|
'''C19: (aplenty, 3.31)'''
|-
|| C13 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C3: ([p], "a", "aplen", 1, 0.3, 4.8)<br />
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|| C3 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C5: ([r], "o", "aplen", 1, 0.3, 4.8)<br />
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|| C5 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C9: ([p], "ap", "plen", 1, 0.3, 4.8)<br />
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|rowspan="2"| C9 ||rowspan="2"|
[p]
  &#124;
[l]
|| "app" || "len" || 1 || 1 || 16 || Match. The resulting candidate is the same as C10 so we're dropping it here [*]  ||
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|colspan="6"| Other GrowOps will violate <code>maxCorrections</code> constraint. ||
C10: ([l], "app", "len", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|rowspan="2"| C10 ||rowspan="2"|
[l]
  &#124;
[e]
|| "appl" || "en" || 1 || 1 || 16 || Match ||
'''C20: ([e], "appl", "en", 1, 0.3, 16 * 0.3)'''<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|colspan="6"| Other GrowOps will violate <code>maxCorrections</code> constraint. ||
C20: ([e], "appl", "en", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|rowspan="2"| C20 ||rowspan="2"|
[e]
  &#124;
EOW
|| "apple" || "n" || 1 || 1 || 16 || Match ||
'''C21: ([e], "apple", "n", 1, 0.3, 16 * 0.3)'''<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|colspan="6"| Other GrowOps will violate <code>maxCorrections</code> constraint. ||
C21: ([e], "apple", "n", 1, 0.3, 4.8)<br />
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|| C21 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint.<br />If we allowed 2 corrections, then we could perform a Deletion-AtEnd here and that would eventually grow into a suggestion, though. ||
C7: ([l], "A", "aplen", 1, 0.3, 3.9)<br />
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|| C7 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C14: ([p], "Al", "plen", 1, 0.3, 3.9)<br />
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|rowspan="2"| C14 ||rowspan="2"|
[p]
  &#124;
[s]
|| "Alp" || "len" || 1 || 1 || 13 || Match ||
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
'''C22: ([s], "Alp", "len", 1, 0.3, 13 * 0.3)'''<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|colspan="6"| Other GrowOps will violate <code>maxCorrections</code> constraint. ||
C15: ([p], "Al", "pen", 1, 0.3, 3.9)<br />
C22: ([s], "Alp", "len", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|rowspan="2"| C15 ||rowspan="2"|
[p]
  &#124;
[s]
|| "Alp" || "en" || 1 || 1 || 13 || Match ||
C22: ([s], "Alp", "len", 1, 0.3, 3.9)<br />
'''C23: ([s], "Alp", "en", 1, 0.3, 13 * 0.3)'''<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|-
| zilia || 127
|colspan="6"| Other GrowOps will violate <code>maxCorrections</code> constraint. ||
C22: ([s], "Alp", "len", 1, 0.3, 3.9)<br />
C23: ([s], "Alp", "en", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|-
| zilian || 255
|| C22 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C23: ([s], "Alp", "en", 1, 0.3, 3.9)<br />
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|| C23 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
C4: ([r], "o", "plen", 1, 0.2, 3.2)
||
C19: (aplenty, 3.31)
|-
|| C4 ||colspan="7"| We can't grow without violating <code>maxCorrections</code> constraint. ||
||
C19: (aplenty, 3.31)
|-
|-
| ziliac || 31
|}
|}
We thus come up with a suggestion: aplenty.
==== Misc Bits and Technical Details ====
* We don't have a "whitelist" of punctuation marks (whitelist = for which insertion GrowOperations don't count as a correction). Rather, one should think of the list of alphabets and their diacritic forms (called <code>variants</code> in the codes) as the ''graylist'' for which insertion GrowOperations reduce a growing candidate's weight, and count as corrections.
** This can lead to unexpected prediction results when the word list contains characters which are not in <code>variants</code>, but which should also not be conceptually treated as whitelisted punctuation marks either. Remember that we whitelist punctuation marks because some languages use them extensively while they may not be always on the default (alphabet) keyboard layout, hence the no-penalty policy.
*** {{bug|1127776}} is an example, where "33" can get corrected to "$US" in French keyboard ("$US" is in the French word list).
*** To avoid such situation in the future, we need to "graylist" a punctuation mark by putting it into <code>rootToAccentedForm</code> (from which <code>variants</code> is derived), as in the bug.
* As described above, <code>predict()</code> returns to its caller (and the JS event queue), while the prediction process resumes by a <code>setTimeout()</code>. Each <code>processCandidates()</code> call, after processing a specific number of candidates, also returns to the JS event queue and resumes by a <code>setTimeout()</code>. Suggestions may take tens of milliseconds to grow, so these methods are designed to be asynchronous, and we periodically give other code a chance to run. This also gives chance for the prediction process to be <code>abort()</code>'ed.
* A candidate's weight is artificially boosted at <code>addCandidate()</code> if we have not made any corrections during its growth. This is to ensure that if the user is typing an infrequent word (but typing it correctly), we don't bump the actual word off the list if there are lots of frequent words that have similar spellings. Still, since Insertion-Extend GrowOperation doesn't count as a correction but still makes less than a perfect match, we boost candidates with such GrowOp less by checking the multiplier.
* We can determine if a priority queue is not full by checking if its threshold is 0 or not. If it's 0, then the queue is not full yet.
** Threshold is the priority of the lowest-priority item in the queue.
== Dictionary Blob Structure ==


, for which the TST tree will be:
{{main|Gaia/System/Keyboard/IME/Latin/Dictionary_Blob}}
[z]
 
  |
Storing a plain-text & human-readable dictionary of words and TST is, of course,
[i]
not optimal for processing by computer programs, in terms of both storage space
  |
utilization and computation performance. Thus, the data structures are stored in a
[l]
binary dictionary blob that offers better efficiency.
  |
[i]
  |
[a]
  |
[n -> EOW -> c]


where the EOW denotes the terminal node.
The [https://github.com/mozilla-b2g/gaia/blob/master/apps/keyboard/js/imes/latin/dictionaries/xml2dict.py xml2dict.py] script
processes the word list, generates the TST along with supporting metadata and auxiliary data structures, and
serializes it into a compact binary blob, which is written as the *.dict file, to be unserialized and used by
the predictions engine.


We also need the terminal node to have frequency information,
This procedure is independent to Gaia build process. If you change a word list's XML file, you must
such that prediction engine knows the frequency of 'zilia', in the example.
manually run xml2dict.py to generate a new corresponding dict file, and then check in the new file
to the repository.


== Known Issues ==
== Known Issues ==
=== Same XML, Different Dict Blob ===
* This is tested on Ubuntu 14.04's Python 3 (required by Makefile), version 3.4.0.
* It is possible that two runs of XML-to-dict conversion on the same wordlist XML produce different dictionary binary blobs, thus triggering git to think there is a new revision of the *.dict.
For example, the following code will result in <code>diff</code> resulting in "different":
# rm en.dict if make doesn't want to make a new en.dict
$ make en.dict
$ mv en.dict en.dict.old
$ make en.dict
$ diff en.dict en.dict.old
Binary files en.dict and en.dict.old differ
* '''The difference does not affect any functionality at any degree'''.
* We look further by tracking the difference:
$ xxd en.dict > en.dict.xxd
$ xxd en.dict.old en.dict.old.xxd
$ diff en.dict.xxd en.dict.old.xxd
27,29c27,29
< 00001a0: 0300 e000 0000 0300 2400 0000 0200 3300  ........$.....3.
< 00001b0: 0000 0200 e400 0000 0200 3200 0000 0100  ..........2.....
< 00001c0: e500 0000 0100 eb00 0000 0100 ee00 0000  ................
---
> 00001a0: 0300 e000 0000 0300 e400 0000 0200 3300  ..............3.
> 00001b0: 0000 0200 2400 0000 0200 3200 0000 0100  ....$.....2.....
> 00001c0: eb00 0000 0100 e500 0000 0100 ee00 0000  ................
* We first determine the location of the difference is at the character table:
** Insert <code>print(output.tell())</code> in <code>emit()</code> function, before and after the first for-loop.
* Now notice the actual difference: bytes from {0x1a8 to 0x1ad} and bytes from {0x1b4 to 0x1b9} are interchangeable: [2400 0000 0020] and [e400 0000 0200]. This is the same for bytes from {0x1c0 to 0x1c5} and from {0x1c6 to 0x1cb}.
* This is because <code>characterFrequency.items()</code> at <code>emit()</code> does not necessarily produce the same ordering on each run. It doesn't matter <code>sorted()</code> does stable sorting.
* Let's insert <code>from pprint import pprint</code> and <code>pprint(list(characterFrequency.items()))</code> at <code>emit()</code> function. Two runs and the results are:
{| class="wikitable"
|-
! First run !! Second run
|-
|
[('é', 220),
('L', 2372),
('-', 277),
('j', 2151),
('M', 4099),
('å', 1),
('H', 2115),
('W', 1544),
('è', 29),
('G', 2251),
('f', 15984),
('a', 110508),
('z', 5884),
('â', 8),
('v', 12864),
('V', 924),
('Y', 325),
('à', 3),
('s', 131933),
('ü', 17),
('m', 35668),
('n', 94375),
('r', 98945),
('w', 11092),
('i', 113155),
('o', 85820),
('h', 30601),
('R', 2268),
('û', 1),
('S', 4682),
('t', 85186),
('ï', 4),
('e', 147827),
('q', 2139),
('I', 1213),
('ë', 1),
('X', 105),
('Q', 205),
('l', 71483),
('F', 1783),
('A', 3463),
('ö', 11),
('K', 1575),
('k', 13233),
('g', 34650),
('D', 2396),
('N', 1525),
('O', 1209),
('P', 2837),
('ñ', 21),
('d', 45495),
('ê', 18),
('B', 3676),
('2', 1),
('3', 2),
('Z', 314),
('u', 43362),
('É', 3),
('ç', 12),
('ó', 7),
('J', 1081),
("'", 30168),
('C', 4323),
('c', 50164),
('b', 24532),
('E', 1684),
('$', 2),
('x', 3605),
('î', 1),
('á', 10),
('p', 33944),
('ô', 6),
('U', 602),
('T', 2515),
('y', 22361),
('ä', 2)]
||


[('î', 1),
See also: [[Gaia/System/Keyboard/IME/Latin/Dictionary_Blob#Known_Issues]]
('ó', 7),
 
('Y', 325),
(No contents in this section)
('å', 1),
('c', 50164),
('ä', 2),
('J', 1081),
('F', 1783),
('l', 71483),
('é', 220),
('U', 602),
('s', 131933),
('x', 3605),
('ô', 6),
('-', 277),
('ë', 1),
('j', 2151),
('ê', 18),
('y', 22361),
('i', 113155),
('K', 1575),
('r', 98945),
('D', 2396),
('ö', 11),
('O', 1209),
('k', 13233),
('t', 85186),
('B', 3676),
('p', 33944),
('h', 30601),
('$', 2),
('É', 3),
('u', 43362),
('3', 2),
('A', 3463),
('Z', 314),
('H', 2115),
('f', 15984),
('T', 2515),
('I', 1213),
('n', 94375),
('e', 147827),
('â', 8),
('b', 24532),
('Q', 205),
('ç', 12),
('G', 2251),
('W', 1544),
('g', 34650),
('è', 29),
("'", 30168),
('R', 2268),
('S', 4682),
('P', 2837),
('2', 1),
('o', 85820),
('ñ', 21),
('z', 5884),
('w', 11092),
('X', 105),
('ï', 4),
('v', 12864),
('V', 924),
('N', 1525),
('á', 10),
('û', 1),
('m', 35668),
('M', 4099),
('C', 4323),
('à', 3),
('L', 2372),
('d', 45495),
('ü', 17),
('q', 2139),
('E', 1684),
('a', 110508)]
|}
* Let's look at the three tuples with frequency = 2, namely ('3', 2), ('$', 2), and ('ä', 2) (at first run). However, at second run, the ordering is ('ä', 2), ('$', 2), and ('3', 2). Thus, the written files are different.
* Let's go back to the xxd diff result above and recall that the difference is the interchanging of [2400 0000 0020] and [e400 0000 0200]. These bytes encode ('$', 2) and ('ä', 2). Also, between these addresses is [3300 0000 0020], encoding ('3', 2). So, we have a change of order of characters ''of the same frequency'' -- which does not affect any functionality, since no particular order of characters of the same frequency is expected, at the character table.


== Ideas for Future Improvements ==
== Ideas for Future Improvements ==
See also: [[Gaia/System/Keyboard/IME/Latin/Dictionary_Blob#Ideas_for_Future_Improvements]]
=== Agglutinative Languages ===
=== Agglutinative Languages ===
* How do we predict for [http://en.wikipedia.org/wiki/Agglutinative_language agglutinative languages]? See {{bug|1119426}} and {{bug|1113395}}.
* How do we predict for [http://en.wikipedia.org/wiki/Agglutinative_language agglutinative languages]? See {{bug|1139255}}.
** According to the bug comments, naïve appending a word onto another word in the dictionaries won't work ideally.
** According to the bug comments, naïve appending a word onto another word in the dictionaries won't work ideally.
=== Combining Characters ===
* [http://en.wikipedia.org/wiki/Combining_character Combining characters] aren't handled very gracefully right now. These characters take separate code points, occupy separate char positions in a string, but appear as one "letter" and may be lexically one "letter" in some specific language.
* Example: 'री', in [http://en.wikipedia.org/wiki/Devanagari Devanagari] (alphabet used for Hindi) is composed of 'र' ('\u0930') and '"ी' ('\u0940').
** (Alright, I know Hindi is not covered in Latin IME. But still...)
* Currently our engine would regard two combining characters as... two independent characters (no surprise), and they take the space of two nodes.
* However, ideally, if the two combining characters represent one single letter in a language, our engine should regard them as one character as though they're one char node.
* How?
=== Emoji and Non-BMP Characters ===
Unfortunately, not all Unicode characters can fit in the 2-byte code space.
The [http://en.wikipedia.org/wiki/Emoji#Blocks emoji blocks] at the Unicode [http://en.wikipedia.org/wiki/Plane_(Unicode)#Supplementary_Multilingual_Plane SMP plane] is a very good example:
The emoji characters occupy code points U+1F300 to U+1F6FF, and an emoji character cannot fit within two bytes.
* If you run xml2dict.py through a word list containing an emoji, then at the line where a character code point value is serialized into a 16-bit integer: <code>output.write(struct.pack(">H", ord(item[0])))</code>, you get a warning.
** And the resulting dictionary blob is, of course, not realistic.
* If you run word_list_converter.js through a word list containing an emoji, then -- alright, JavaScript uses UCS-2 to encode unicode code points (tongue twisting sentence, I know). So those emoji characters get break into [http://en.wikipedia.org/wiki/UTF-16#U.2B010000_to_U.2B10FFFF surrogate pairs].
** For example, this earth emoji '🌏', with code point value U+1F30F, is broken into the surrogate pair '\uD83C\uDF0F' in UCS-2.
** This will (conveniently) allow us to squeeze an emoji character into our current dictionary blob structure without modification. However, we'll still run into the issue as in "Combining Characters": the engine needs to know that a surrogate pair, albeit taking some space of two nodes, is one single letter.


=== Misc ===
=== Misc ===
* {{bug|908487}} may use nodes linked by "next" pointers to know which keys to enlarge.
* {{bug|908487}} may use nodes linked by "next" pointers to know which keys to enlarge.
* When we check to reject some incoming item in a BoundedPriorityQueue, we naïvely compare the item's priority with the threshold of the queue, in [https://github.com/mozilla-b2g/gaia/blob/master/apps/keyboard/js/imes/latin/predictions.js#L890 predictions.js].
** Currently, if the item's priority equals the threshold, the item is immediately rejected.
** However, under such situation, there might exist other factors for consideration if we're to reject the incoming item, or to actually bump off some item in the queue (whose priority equals the threshold).
** Such factors might include:
**# if an item under question is almost fully grown, i.e. <code>remaining</code> is empty, while some other isn't.
**# if an item under question has less corrections than some other.
**# ...
** The idea is that a candidate that has gone through growth process more than a candidate that hasn't might be more capable to become a suggestion (despite both having the same weight); the same for number of corrections made.


== Unclarified Questions ==
== Unclarified Questions ==
* Is character table's frequency information not being used by predictions.js?
* Is dictionary blob's character table's frequency information not being used by predictions.js?
* Most of the frequency/weight/priority comparison logics involve floating-point comparison but are coded as naïve comparison operators like <code>&lt;=</code> or <code>&gt;=</code>, which are not the best practice for comparing floating-point numbers.
** Do we need to employ epsilon-backed comparison? Does it make a difference (in terms of prediction results & quality)? How large is the difference? Is the difference larger/smaller if we also employ the queue-rejection improvement in [[#Ideas for Future Improvements]] ?

Latest revision as of 08:51, 26 February 2016

The Latin IME of Gaia's built-in Keyboard App contains a prediction/auto-correction feature. As its name implies, given a user-input word, it guesses what the user means to type (the "auto-correction" sense), or what the user is planning to type (the "prediction" sense). The feature relies on a dictionary, i.e. a list of words, for a language, and a prediction engine that employs heuristics-based algorithms to match the user input against the dictionary words.

A list of keyboard languages currently supported by Latin IME may be found at L10n:B2G/Fonts.

This article contains introductory materials to the data structures and algorithms upon which the prediction engine is built, and offers illustrative examples where applicable.

Data Structures and Algorithms

Data Structures

To facilitate auto-correction and prediction, our dictionary is represented with a data structure based on Ternary Search Tree. When we generate a TST from a list of words and frequencies of the words, we balance the TST such that at any node, the highest-frequency word is found by following the center pointer. In order to be able to find the next highest-frequency word with the same prefix (i.e. of the same subtree parent), each center child node is a linked list, linked by the "next" pointer, which we may follow.

Source Word List

Currently, the lists of words come from the Android source and is stored in XML format. Each word entry includes a frequency, between 0 and 255, denoting how frequent the word appears in the language. As a special case, a frequency of 0 means profanity -- words that should not be considered a typo, but that should never be suggested explicitly.

The word list is both case- and diacritic-sensitive, and so is the resulting TST blob.

Example TST

Let's take a list of words:

'zila', 'zilian', 'ziliac', 'zart', 'zalarm', 'zset'

And their frequency is:

Word Frequency
zila 255
zilian 175
ziliac 55
zart 215
zalarm 135
zset 95

The TST looks like this:

[z]
 |
[i     ->     a     ->     s]
 |            |            |
[l]          [r  ->  l]   [e]
 |            |      |     |
[a  ->  i]   [t]    [a]   [t]
        |            |
       [a]          [r]
        |            |
       [n  ->  c]   [m]

The vertical up-to-down edge is a center pointer, while a horizontal left-to-right edge denotes the "next" pointer. (The spacing within a horizontal edge doesn't matter, of course). Note that the ordering of [i -> a -> s] in the second level is important, and cannot be something else like [a -> s -> i] in our case, as the words traversable (through center pointers) from 'i' have the highest frequency (zila = 255) than those from 'a' (zart = 215), and in turn, than those from 's' (zset = 95). Similarly, [a -> i] in z-i-l ancestry cannot be [i -> a] as 'zila' is more frequent than 'zilian' and 'ziliac'.

It's worth noting that in real-world scenarios, the concept of "root node" is also a list linked by next pointers -- indeed, not all English words begin with 'z'.

The "Level" Concept

The idea of "center" and "next" pointers can sound very confusing at first glance. The concept of levels can help clarify the idea of "center" and "next" pointers:

  • When we follow the "center" pointer, we advance one level down.
  • When we follow the "next" pointer, we stay at the same level.
  • Level in TST is analogous to character position in a string.

This concept is particularly important because during prediction/auto-correction, we match the input string against the TST, from the beginning of the string and the root of the tree, one character by one character. As level is analogous to character position in the string, when we follow the center pointer one level down, we kind of mean "We believe matching this letter at this character position is pretty good. Let's proceed with the next character.", while when we follow the next pointer, we kind of mean "We believe matching this letter at this character position is not that good. Let's see if matching another letter, at this character position, would be good." -- Then, since the list of "next" nodes are sorted from highest frequency to lowest frequency, we're more likely to hit the "pretty good" candidates before we try the "not that good" candidates.

Now, please remember I've just said "kind of" there. The real matching process is much more complex and the scenario above is very much oversimplified. Please refer to the Algorithms section for the real thing.

Prediction Algorithms

The prediction algorithm is mostly implemented in predictions.js. However, latin.js does some post-processing on the suggestions returned from the prediction engine, in handleSuggestions(), and it's important to understand the underlying dependency the post-processing procedures may have with the prediction algorithms.

Big Picture & Terminology

Note: This sub-section omits many details in order to keep the introduction streamlined.

Terminology is in bold text. I strive to map this Wiki documentation with the terminology in the codes with bug 1124150.

The prediction mechanism works by finding loose matches for the user's input. The hard part of this process is getting that "loose" part right --- we employ heuristics where we see if slight changes to the user's input can transform the input into a known word, and if so, we say the known word is a loose match for the user input.

The goal for the prediction/matching process is to return a number of suggestions which are possible correct spellings of the user input. We generate such suggestions by growing candidates iteratively.

A candidate is a collective information of:

  1. A prefix of a dictionary word.
  2. (Deriving from last point) a specific TST node as we get by traversing from TST root node down a specific levels of center nodes, and optionally moving horizontally with next pointers within a level.
  3. A suffix of the user input.
  4. And other auxiliary information, such as the weight of this candidate, detailed in sub-sections below.

At each iteration, we generate new candidates:

  1. Let's colloquially call the current candidate's prefix as the "current prefix".
  2. There are a few dictionary word prefixes, represented by TST nodes that may be reachable from the current candidate's TST node.
    • Let's also colloquially call them "potential new prefixes".
    • Keep in mind that "reachable" includes both the center and the next pointers (details in later sections).
    • These potential new prefixes have the potential to become new individual candidates.
  3. Essentially, the potential new prefix differs from the current prefix by only one character. It's either a new char appending to the current prefix, or a changed char of the last char of the current prefix.
  4. The potential new prefix can only become a new candidate if such difference is allowed by GrowOperations, a set of predefined rules for the matching process.
    • The GrowOperations determines such eligibility by considering the nature of the new/changed character, the suffix of the user input, and the auxiliary information.
    • Usually, only the first char of the user input suffix is considered, since there is only one char difference between the current prefix and the potential new prefix.
    • And only those potential new fixes that fit the GrowOperation will become new candidates to be further grown in the next iteration.

In the TST example above, if the current candidate is "zar", then the "potential new prefixes" may be "zart", and "zal".

Usually, we go down center nodes. So, the resulting candidate's prefix is usually one char longer than the previous candidate's prefix, and the resulting candidate's user input suffix is usually one char shorter than the previous candidate's suffix. When we finally run out of the user input suffix, the final candidates are sent back to Latin IME as suggestions.

During this matching process, we associate a weight with each candidate we're considering. A weight of a candidate is a collective product of that candidate's frequency value (as stored in its TST node), and how many and what kinds of GrowOperations we have had done during the growth of the candidate to make it look like the user input. We'll introduce the weighting detail later in this section. To avoid exponentially growing our search space, candidates and suggestions are stored in a bounded priority queue, their weight being the priority. When we go from one iteration to another iteration, we always retrieve the highest-priority candidates from the queue and grow candidates from the queue. Low-priority candidates are dropped when the queue is full.

Generally speaking, we have a few kinds of heuristic GrowOperations:

GrowOp Description Example
Insertion

The last char of the potential new prefix is not the first char of the user input suffix. In the next iteration, the user input suffix remains the same.

  • Dict word: "zarca"
  • User input: "zaca"
  • Old prefix: "za"
  • User input suffix: "ca"
  • New prefix: "zar"
  • Next-iteration user input suffix: "ca"
Deletion

The last char of the potential new prefix is the second char of the user input suffix; we 'delete' the first char of the user input suffix by dropping the first & second chars in the user input suffix to form the next iteration's suffix.

Note that the candidate's prefix in the next iteration will contain the user input suffix second char in this iteration, so we drop two chars here.

  • Dict word: "zaar"
  • User input "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "za"
  • Next-iteration user input suffix: "r"
Transposition

Conceptually, this is "the last char of the potential new prefix is the second char of the user input suffix, and the center node of the this TST node contains the first char of the user input suffix".

However, in reality, we are not checking the "and" part. We merely allow the potential new prefix to become the candidate's new prefix in the next iteration, and drop the first char of the user input suffix. In the next iteration, if the first char of the user input suffix (which is also the first char of that in this current iteration) can't be matched against the new prefix's center node character without expensive GrowOperations, the grown candidate will be likely discarded according to weight/correction limit as detailed below.

  • Dict word: "zaacr"
  • User input: "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "zaa"
  • Next-iteration user input suffix: "cr"

Substitution

We just drop the user input suffix's first char.

  • Dict word: "zaxar"
  • User input "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "zax"
  • Next-iteration user input suffix: "ar"

Additionally, there is indeed a "non-heuristic" GrowOperation:

GrowOp Description Example
Match

The last char of the potential new prefix is the first char of the user input suffix.

  • Dict word: zacry
  • User input "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "zar"
  • Next-iteration user input suffix: "ar"

In addition to the weighting mechanism, the prediction engine also imposes a maximum number of corrections that can be made for a user input to match a specific suggestion. Most of the GrowOperations count as corrections, but some specific GrpwOperations (such as substituting a character with its diacritic variant or with its capitalized variant) don't. Note we're talking about suggestions here -- a candidate growing into a new candidate retains the information of the number of corrections we have had to make to match the user input. During the process, when we need to make more corrections than allowed for the user input to match a specific candidate, we drop that candidate from the queue right away.

Now back to the weight of the candidates. As said above, this weight is based on the word frequency of the candidate's TST node, but it is reduced when we make GrowOperations to the user's input, to reflect our decreased confidence in making such suggestion. In the codes, we define a few weight multipliers when we make various kinds of GrowOperations. Such multipliers changes a candidate's weight as it grows and as we go through the matching process. Generally speaking, substituting variant forms and inserting apostrophes does not reduce the weight by much at all. Substituting a nearby key reduces the weight moderately (depending on how near the keys are on the screen). Insertions, deletions and transpositions reduce the weight significantly. A character match, of course, does not reduce the weight at all.

The prediction engine holds nearbyKeys structure which contains spatial information of keys -- which keys are near each other, and how near they are. Such information is provided by the Latin IME, subsequently provided by Keyboard App's layout rendering management logics. As said above, the spatial information is used to weigh substitutions.

Finally, the weight of a fully-grown candidate is the weight of the suggestion that the candidate has grown into. Or rather, we can say it's the confidence on the suggestion that the prediction engine has, toward a specific user input.

Function Call Breakdown

We're not doing a line-by-line code walkthrough here. This is an illustration of: which function calls which, and which does what when called; but without a few technical subtleties; it is meant to paint (only) the overall picture of the functionality/responsibility of each function.

predict() and getSuggestions()

predict() is the called by Latin IME and is the commencement of the whole prediction process. Here, we're provided with the user input, the max number of suggestions, candidates, and corrections for each suggestion. The number of suggestions and candidates limit our search space, and it should be no surprise that the number of the candidates should be larger than that of the suggestions. The max number of corrections limits the difference between the user input and the suggestions, and we usually allow only 1 or 2 corrections.

This function synchronously set up some data structures before triggering getSuggestions() asynchronously. getSuggestions() may be regarded as "step 2" of predict() and is only executed once in the whole prediction process. Here, we check for early-return cases (where the user input contains chars that aren't even in the dictionary, or the user input is longer than the longest word in the dictionary). Then, we begin the growing process by adding the first candidate into the candidate queue and call processCandidates().

And that first candidate is: an empty prefix, and the whole user input string as the suffix; and that's at the root node of the TST. As explained below, we go through the list of next-pointed nodes in an iteration, so we'll get some potential new prefixes with some next nodes of the root node (recall that all these next nodes are at the same level as the root node) that fit some GrowOperations for the first char of the user input.

The Growing Process

As outlined in the previous sub-section, the growing process heuristically GrowOperates a candidate and evaluate such GrowOperations against the user input. It's called growing because we carry out this process character by character, from an empty string (= TST root node) to a complete dictionary word (= as we level-down to a terminal node).

The growing process consists of addCandidate(), processCandidates() and process() cycles. As we've just called addCandidate() at getSuggestions(), let's begin with processCandidates().

processCandidates() iterates through candidates in the queue, and calls process() on each candidate. For each candidate, process() GrowOperates the prefix to match the user input suffix & generate new candidates according to various pieces of information, and calls addCandidate() to add the new candidates to the queue; or, it calls addSuggestion() if a candidate has grown into a suggestion. Not surprisingly, the new candidates will be processed in the next time processCandidates() retrieves them from the queue.

Aside from the candidate information said in big picture sub-section, process() uses these auxiliary information pieces to GrowOperate on candidates:

  1. The number of corrections we have made during the growth of the candidate, as we can't make more corrections than max allowed.
  2. The TST node's word frequency and some other weight information (details below in process() and addCandidate()).
  3. The character of the TST node, of its next nodes, and of the center nodes of this TST node and its next nodes; this affects how we weigh GrowOperations.
  4. The difference of the length of the user input and the levels-down of the TST node (details in the list of growOperations below).

The growing process is heuristic, and it halts when we don't have any more candidates to grow in processCandidates(), or when the highest-priority candidate has a weight too low.

Again, when a candidate's growing process has just started, its prefix is an empty string, and the suffix is the user input. When it ends (and we find some suggestion), the prefix is the suggestion that the candidate has grown into, and the suffix is a possibly empty string.

processCandidates()

We retrieve a certain number of candidates from the queue. As the queue is a priority queue, we always retrieve the highest-priority candidate on each queue remove() call.

As said above, the heuristic process halts when we don't have any more candidate from the queue, or when the weight of the highest-priority candidate is too low. "Too low" means the candidate's weight is lower than the least-priority suggestion in the suggestion queue (i.e. suggestions.threshold), which means the candidate will have no chance to grow into a suggestion that can be inserted into the suggestion queue.

The heuristic process halts by processCandidates() returning without calling setTimeout(processCandidates) to re-trigger itself.

Note that the idea of "one" processCandidates() call does not map to the idea of "one" growth iteration for candidates. The number of candidates to retrieve in each processCandidates() call is a fixed constant, only relevant to UX responsiveness (how frequently we want to relinquish our execution back to the JS engine's event queue).

addCandidate()

A candidate along with its related information is added to the candidate queue here. The information, as process() relies on and as briefly listed above, formally includes (in function argument's order):

  • pointer: The TST node the candidate's prefix is associated with.
  • output: The (current) prefix, that is growing. This will loosely resemble, if not match, a prefix of the user input.
  • remaining: The user input suffix, i.e. the remaining part of the input we haven't attempted to match.
  • multiplier: The multiplicative product of the multipliers from different GrowOperations we've performed during the growth of the candidate.
  • frequency: The candidate node's word frequency.
  • corrections: The number of corrections we have made during the growth of the candidate.

We may also apply a couple weight boost mechanisms here.

process()

process() is the core of the growing process. It performs various GrowOperations on one candidate, and calls addCandidate() with the results; in our words, we say the candidate has gone through one growth iteration. If we have reached the end of the user input, then the candidate is supplied to addSuggestion to add to the suggestion queue.

In order to see what we may potentially GrowOperate the candidate into, process() goes through the nodes linked by the next pointers of the candidate's associated TST node (and the TST node itself), to form various "potential new prefixes". Let's see a quick example of a single iteration.

  1. Consider the TST in the beginning of the article. Suppose the user input is "zola".
  2. Suppose at the previous iteration, we added a candidate of: prefix "z", user input suffix "ola", with the Match GrowOpeation. From the previous iteration to this iteration, we go down the TST level by one and we're now at the [i -> a -> s] level. The node is 'i'.
  3. At the current iteration, output is "z". remaining is "ola". We're at the "i" node. At this moment, we want to loop through all the next-nodes for 'i', i.e. 'i', 'a', 's', and see if the GrowOperations work out on the candidate.
  4. As it turns out, we'll only have:
    • Three substitution GrowOperations, which results in three new candidates with prefix 'zi', 'za', 'zs'; and all have suffix 'la'. The TST nodes associated with the candidates are different subtrees from 'i', 'a', and 'z'.
      • If we're specifically talking about QWERTY keyboard, then, within these substitutions, "zi" will weigh more than other substitutions, as 'i' and 'o' are nearby.
    • Three insertion GrowOperations, which results in three new candidates with prefix 'zi', 'za', 'zs'; and all have suffix 'ola'. The TST nodes associated with the candidates are different subtrees from 'i', 'a', and 'z'.
  5. Finally, if we use default weighting and maxCorrection parameters (which is 1), for all the six new candidates, only the substitution Grow Operation for 'i' will actually fully grow into a suggestion. Indeed, our tree doesn't have other dictionary words that are less than two corrections from "zola".

A list of GrowOperations will be introduced in a later section, including their weight multipliers, whether they count as corrections, and so on.

In each process() call, we may try more than one GrowOperations for a candidate (and call addCandidate() for each applicable GrowOperation result).

It's worth noting that not all GrowOperations are mutually exclusive, and for those who are, we always check if a specific GrowOperation is feasible first before checking another. This is codified in process(). Some GrowOperations are coded within a if-then-elseif-else chain, while being associative with other GrowOperations who themselves form a separate if-then-else chain. Note that the order of GrowOperations within one if-then-else chain is important: We always test for the higher multiplier one first. Still others put a continue statement that bypasses all remaining GrowOperations, for this next node of this candidate.

addSuggestion()

Similar to addCandidate(), addSuggestion() adds the grown suggestion to the suggestion queue. A few subtleties exist:

  1. If the user input has its first letter capitalized, we capitalize such the first letter in the suggestion too.
    • Note that the dictionary defines words of different capitalization as different words, possibly with different weights. However, there are some situations where we might want the first letter of a word to be capitalized, while contextually the word should be regarded as a lowercase word in the dictionary. A good example is the beginning of a sentence.
  2. We may get duplicated suggestions, grown from different candidates. We keep the higher-priority one, for sure.

List of GrowOperations

This section details each GrowOperation, including whether it counts as a correction and the associated multiplier. You may find the GrowOp names in actual codes.

GrowOp Description Correction? Multiplier Example
Match

The last char of the potential new prefix is the first char of the user input suffix.

No

1

  • Dict word: zacry
  • User input "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "zar"
  • Next-iteration user input suffix: "ar"
Insertion-Punctuation

The last char of the potential new prefix is a punctuation character (an apostrophe, for example), and is not the first char of the user input suffix. In the next iteration, the user input suffix remains the same.

No

PUNCTUATION_INSERTION_MULTIPLIER
= 0.95

  • Dict word: "za'ca"
  • User input: "zaca"
  • Old prefix: "za"
  • User input suffix: "ca"
  • New prefix: "za'"
  • Next-iteration user input suffix: "ca"
Insertion-Any

(This is the "regular" Insertion GrowOp as explained in the #Big Picture & Terminology section.)

Yes

INSERTION_MULTIPLIER
= 0.3

  • Dict word: "zarca"
  • User input: "zaca"
  • Old prefix: "za"
  • User input suffix: "ca"
  • New prefix: "zar"
  • Next-iteration user input suffix: "ca"
Insertion-Extend

The user input suffix is empty, but the TST node is not a terminal node. We effectively append the TST node's char to generate the new prefix.

No

WORD_EXTENSION_MULTIPLIER
= 0.4

  • Dict word: "zarcat"
  • User input: "zarc"
  • Old prefix: "zarc"
  • User input suffix: ""
  • New prefix: "zarca"
  • Next-iteration user input suffix: ""
Deletion

(This is the "regular" Deletion GrowOp as explained in the #Big Picture & Terminology section.)

Yes

DELETION_MULTIPLIER
= 0.1

  • Dict word: "zaar"
  • User input "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "za"
  • Next-iteration user input suffix: "r"
Deletion-AtEnd

This is mostly the same as the regular deletion except that it's specifically for when we arrive at the terminal node while still having one character at the user input suffix. The rationale is the same: think that the user input suffix's second char is "nothing", just as we have no center pointer for this node. The code logics for the regular Deletion GrowOp would fail, hence the special case.

Yes

DELETION_MULTIPLIER
= 0.1

  • Dict word: "zaar"
  • User input "zaare"
  • Old prefix: "zaar"
  • User input suffix: "e"
  • New prefix: "zaar"
  • Next-iteration user input suffix: ""
Transposition

(Same as explained in the #Big Picture & Terminology section.)

Yes

TRANSPOSITION_MULTIPLIER
= 0.3

  • Dict word: "zaacr"
  • User input: "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "zaa"
  • Next-iteration user input suffix: "cr"
Substitution-Any

(This is the "regular" Substitution GrowOp as explained in the #Big Picture & Terminology section.)

Yes

SUBSTITUTION_MULTIPLIER
= 0.2

(QWERTY KEYBOARD)

  • Dict word: "zaxar"
  • User input "zapar"
  • Old prefix: "za"
  • User input suffix: "par"
  • New prefix: "zax"
  • Next-iteration user input suffix: "ar"
Substitution-Variant

This is a special Substitution case: The last char of the potential new prefix is a diacritic form of the user input's first char, and/or they differ in capitalization. The diacritic part is unidirectional: we perform this GrowOp if the dict word char is a diacritic and the user input char is "plain", but not if the dict word char is "plain" and the user input char is a diacritic.

No

VARIANT_FORM_MULTIPLIER
= 0.99

  • Dict word: "zaçar"
  • User input "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "zaç"
  • Next-iteration user input suffix: "ar"
Substitution-Near

This is a special Substitution case where the last char of the potential new prefix is near the user input's first char as dictated by nearbyKeys.

Yes

Whichever is larger:
Nearness
or
SUBSTITUTION_MULTIPLIER
= 0.2

(QWERTY KEYBOARD)

  • Dict word: "zaxar"
  • User input "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "zax"
  • Next-iteration user input suffix: "ar"

Nearness

The nearByKeys structure is generated by latin.js at nearbyKeys(). The following example list records nearby keys to 'g', along with their nearness, in QWERTY EN-US layout, obtained from FxOS running on Flame.

f: 0.701416015625
h: 0.701416015625
v: 0.49862291921977125
t: 0.423379813234514
y: 0.423379813234514
b: 0.29144229503160113
c: 0.29144229503160113
r: 0.19181633764327974
u: 0.19181633764327974
d: 0.17535400390625
j: 0.17535400390625

Note that in the actual codes, the concept is usually written as "distance", but we're actually talking about the degree of closeness since the value is directly used as a multiplier when doing the Subsitution GrowOp.

Step-by-Step Examples

In this section, we give a few prediction examples for a couple user input words, against a tiny TST tree. This includes candidates' growth iterations and the status of the candidates/suggestions queue. Since our word list is very short, we limit to our candidate queue's capacity to 10 and suggestion queue to 2, in order to demonstrate the idea of bounded priority queue and those early return scenarios. We allow a max of 1 correction.

These illustrative examples do not include all GrowOperation instances.

We'll be operating on a regular QWERTY keyboard; Distance information is retrieved from FxOS running on Flame.

And here is our list of words and frequency, directly excerpted form our en_us wordlist:

Word Frequency Normalized Frequency
Alps 97 13
aplenty 55 7
apple 107 14
apply 127 16
ogre 77 10
oral 119 15
organic 123 16

Which gives this TST:

[a       ->       o       ->       A]
 |                |                |
[p]              [r    ->    g]   [l] 
 |                |          |     |
[p    ->    l]   [g -> a]   [r]   [p]
 |          |     |    |     |     |
[l]        [e]   [a]  [l]   [e]   [s]
 |          |     |
[y -> e]   [n]   [n]
            |     |
           [t]   [i]
            |     |
           [y]   [c]
Example user-input "apple"

A set of rowspan'ed rows denotes one growth iteration of a specific old candidate, possibly into more-than-one new candidates. We artificially ID candidates for the sake of clarity for this documentation. A newly added candidate is in bold face in the Candidate Queue column.

The table is long, so the footnotes are put here:

[†] 16 is the frequency of the most-frequent word that we can traverse (downwards) from this [a], corresponding to "apply".

[‡] Nearness is used as multiplier.

[*] EOW denotes a terminal node. It's omitted from the tree above, but provides illustrative assistance to determine the end of a dictionary word in our steps.

Old
Candidate
Potential New Candidate Explanation Candidates Queue
(node, output, rem, cor, multi, weight)
Suggestions Queue
ID TST Node
and
center node
output remaining Cumulative
corrections
multiplier word
frequency
"" "apple" 0 1 1 The first candidate by getSuggestions() C1: ([a], "", "apple", 0, 1, 1)
C1
[a]
 |
[p]
"a" "pple" 0 1 16 [†] Match with a weight boost at addCandidate() C2: ([p], "a", "pple", 0, 1, 16 + 100)
"a" "apple" 1 0.3 16 Insertion-Any

C2: ([p], "a", "pple", 0, 1, 116)
C3: ([p], "a", "apple", 1, 0.3, 16 * 0.3)

[o]
 |
[r]
"o" "pple" 1 0.2 16 Followed next from [a] to [o]; Substitute-Any

C2: ([p], "a", "pple", 0, 1, 116)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C4: ([r], "o", "pple", 1, 0.2, 16 * 0.2)

"o" "apple" 1 0.3 16 Insertion-Any

C2: ([p], "a", "pple", 0, 1, 116)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 16 * 0.3)
C4: ([r], "o", "pple", 1, 0.2, 3.2)

[A]
 |
[l]
"A" "pple" 0 0.99 13 Followed next from [o] to [A]; Substitution-Variant. Weight boost at addCandidate()

C2: ([p], "a", "pple", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 13 * 0.99 + 100)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C4: ([r], "o", "pple", 1, 0.2, 3.2)

"A" "apple" 1 0.3 13 Insertion-Any

C2: ([p], "a", "pple", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 13 * 0.3)
C4: ([r], "o", "pple", 1, 0.2, 3.2)

C2
[p]
 |
[p]
"ap" "ple" 0 1 16 Match with a weight boost at addCandidate()

C8: ([p], "ap", "ple", 0, 1, 16 + 100)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)

"ap" "pple" 1 0.3 16 Insertion-Any

C8: ([p], "ap", "ple", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)

"ap" "ple" 1 0.3 16 Transposition of this [p]'s center child, also a [p]

C8: ([p], "ap", "ple", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)

"ap" "le" 1 0.1 16 Deletion of this [p], and go to this [p]'s center child, also a [p]

C8: ([p], "ap", "ple", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C11: ([p], "ap", "le", 1, 0.1, 16 * 0.1)

C8
[p]
 |
[l]
"app" "le" 0 1 16 Match with a weight boost at addCandidate()

C12: ([l], "app", "le", 0, 1, 16 + 100)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C11: ([p], "ap", "le", 1, 0.1, 1.6)

"app" "ple" 1 0.3 16 Insertion-Any
At this moment, our candidate queue is full and threshold is 1.6.

C12: ([l], "app", "le", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C11: ([p], "ap", "le", 1, 0.1, 1.6)

[l]
 |
[e]
"apl" "le" 1 0.423 [‡] 7 Followed next from [p] to [l]; Substitute-Near
This bumps off C11.

C12: ([l], "app", "le", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C14: ([e], "apl", "le", 1, 0.42, 7 * 0.423)

"apl" "ple" 1 0.3 7 Try performing Insertion-Any;
since this results in too low a weight, it's not added.

C12: ([l], "app", "le", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C14: ([e], "apl", "le", 1, 0.42, 2.96)

"apl" "pe" 1 0.3 7 Try performing Transposition;
Since this results in too low a weight, it's not added.

C12: ([l], "app", "le", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C14: ([e], "apl", "le", 1, 0.42, 2.96)

"apl" "e" 1 0.1 7 Try performing Deletion;
Since this results in too low a weight, it's not added.

C12: ([l], "app", "le", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C14: ([e], "apl", "le", 1, 0.42, 2.96)

C12
[l]
 |
[y]
"appl" "e" 0 1 16 Match with a weight boost at addCandidate()

C15: ([y], "appl", "e", 0, 1, 16 + 100)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C14: ([e], "apl", "le", 1, 0.42, 2.96)

"appl" "le" 1 0.3 16 Insertion-Any
This bumps C14 off.

C15: ([y], "appl", "e", 0, 1, 16 + 100)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)

C15
[y]
 |
EOW

[*]

"apply" "" 1 0.2 16 Substitution-Any

C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C17: (EOW, "apply", "", 1, 0.2, 16 * 0.2)

"apply" "e" 1 0.3 16 Insertion-Any
This bumps either C4 or C17 off, say we bump C4 here.

C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C18: (EOW, "apply", "e", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C17: (EOW, "apply", "", 1, 0.2, 3.2)

[e]
 |
EOW
"apple" "" 0 1 14 Followed next from [y] to [e], Match with a weight boost at addCandidate()
This bumps C17 off.

C19: (EOW, "apple", "", 0, 1, 14 + 100)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C18: (EOW, "apply", "e", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)

"apple" "e" 1 0.3 14 Insertion-Any
This bumps C7 off.

C19: (EOW, "apple", "", 0, 1, 114)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)

C19
 EOW
"apple" "" This is a grown suggestion.

C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)

C19: (apple, 14)

C6
[l]
 |
[e]
"Al" "ple" 1 0.423 13 Substitution-Near

C21: ([e], "Al", "ple", 1, 0.99 * 0.423, 13 * 0.99 * 0.423)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)

C19: (apple, 14)

"Al" "pple" 1 0.3 13 Insertion-Any

C21: ([e], "Al", "ple", 1, 5.44)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 13 * 0.3)

C19: (apple, 14)

C21 We can't grow without violating maxCorrections constraint.

C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C3 We can't grow without violating maxCorrections constraint.

C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C5 We can't grow without violating maxCorrections constraint.

C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C9
[p]
 |
[l]
"app" "ple" 1 1 16 Match

C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C23: ([l], "app", "ple", 1, 0.3 * 1, 16 * 0.3 * 1)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

Other GrowOps will violate maxCorrections constraint.

C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C23: ([l], "app", "ple", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C10
[p]
 |
[l]
"app" "le" 1 1 16 Match

C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C23: ([l], "app", "ple", 1, 0.3, 4.8)
C24: ([l], "app", "le", 1, 0.3 * 1, 16 * 0.3 * 1)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

Other GrowOps will violate maxCorrections constraint.

C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C23: ([l], "app", "ple", 1, 0.3, 4.8)
C24: ([l], "app", "le", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C13 We can't grow without violating maxCorrections constraint

C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C23: ([l], "app", "ple", 1, 0.3, 4.8)
C24: ([l], "app", "le", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C16 We can't grow without violating maxCorrections constraint

C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C23: ([l], "app", "ple", 1, 0.3, 4.8)
C24: ([l], "app", "le", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C17 We can't grow without violating maxCorrections constraint

C23: ([l], "app", "ple", 1, 0.3, 4.8)
C24: ([l], "app", "le", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C23 We can't grow without violating maxCorrections constraint

C24: ([l], "app", "le", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C24
[l]
 |
[e]
"appl" "e" 1 1 16 Match

C25: ([e], "appl", "e", 1, 0.3 * 1, 16 * 0.3 * 1)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

Other GrowOps will violate maxCorrections constraint.

C25: ([e], "appl", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C25
[e]
 |
EOW
"apple" "" 1 1 16 Match

C26: (EOW, "apple", "", 1, 0.3 * 1, 16 * 0.3 * 1)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

Other GrowOps will violate maxCorrections constraint.

C26: ([e], "appl", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C26
 EOW
"apple" "" Grown candidate, but the word is already in Suggestion Queue with a larger weight, so do nothing.

C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C20 We can't grow without violating maxCorrections constraint.

C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C22 We can't grow without violating maxCorrections constraint.

C19: (apple, 14)

So we come up with a suggestion: apple.

Example user-input "orfanic"
Old
Candidate
Potential New Candidate Explanation Candidates Queue
(node, output, rem, cor, multi, weight)
Suggestions Queue
ID TST Node
and
center node
output remaining Cumulative
corrections
multiplier word
frequency
"" "orfanic" 0 1 1 The first candidate by getSuggestions() C1: ([a], "", "orfanic", 0, 1, 1)
C1
[a]
 |
[p]
"a" "rfanic" 1 0.2 16 Substitution-Any C2: ([p], "a", "rfanic", 1, 0.2, 16 * 0.2)
"a" "orfanic" 1 0.3 16 Insertion-Any

C3: ([p], "a", "orfanic", 1, 0.3, 16 * 0.3)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)

[o]
 |
[r]
"o" "rfanic" 0 1 16 Followed next from [a] to [o]; Match with a weight boost at addCandidate()

C4: ([r], "o", "rfanic", 0, 1, 16 + 100)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)

"o" "orfanic" 1 0.3 16 Insertion-Any

C4: ([r], "o", "rfanic", 0, 1, 116)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 16 * 0.3)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)

[A]
 |
[l]
"A" "rfanic" 1 0.2 13 Followed next from [o] to [A]; Substitution-Any

C4: ([r], "o", "rfanic", 0, 1, 116)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C6: ([l], "A", "rfanic", 1, 0.2, 13 * 0.2)

"A" "orfanic" 1 0.3 13 Insertion-Any

C4: ([r], "o", "rfanic", 0, 1, 116)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C7: ([l], "A", "orfanic", 1, 0.3, 13 * 0.3)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)

C4
[r]
 |
[g]
"or" "fanic" 0 1 16 Match

C8: ([g], "or", "fanic", 0, 1, 16 + 100)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)

"or" "rfanic" 1 0.3 16 Insertion-Any

C8: ([g], "or", "fanic", 0, 1, 116)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)

[g]
 |
[r]
"og" "fanic" 1 0.2 10 Followed next from [r] to [g]; Substitution-Near, but g-r distance is 0.1918, which is less than 0.2, so we use 0.2 as multiplier.

C8: ([g], "or", "fanic", 0, 1, 116)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)
C10: ([r], "og", "fanic", 1, 0.2, 10 * 0.2)

"og" "rfanic" 1 0.3 10 Insertion-Any

C8: ([g], "or", "fanic", 0, 1, 116)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 10 * 0.3)
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)
C10: ([r], "og", "fanic", 1, 0.2, 2.0)

C8
[g]
 |
[a]
"org" "anic" 1 0.7014 16 Substitution-Near

C12: ([a], "org", "anic", 1, 0.7014, 16 * 0.7014)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)
C10: ([r], "og", "fanic", 1, 0.2, 2.0)

"org" "fanic" 1 0.3 16 Insertion-Any; Queue is now full, threshold = 2.0.

C12: ([a], "org", "anic", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)
C10: ([r], "og", "fanic", 1, 0.2, 2.0)

[a]
 |
[l]
"ora" "anic" 1 0.2 15 Followed next from [g] to [a]; Substitution-Any
This bumps C10 off.

C12: ([a], "org", "anic", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
C14: ([l], "ora", "anic", 1, 0.2, 15 * 0.2)
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)

"ora" "fanic" 1 0.3 15 Insertion-Any; C6 is bumped off.

C12: ([a], "org", "anic", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 15 * 0.3)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
C14: ([l], "ora", "anic", 1, 0.2, 3.0)

"ora" "fnic" 1 0.3 15 Transposition; C14 is bumped off.

C12: ([a], "org", "anic", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 15 * 0.3)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

"ora" "nic" 1 0.1 15 Try performing Deletion;
since this results in too low a weight, it's not added.

C12: ([a], "org", "anic", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C12
[a]
 |
[n]
"orga" "nic" 1 1 16 Match

C17: ([n], "orga", "nic", 1, 0.7014 * 1, 16 * 0.7014 * 1)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

Other GrowOps will violate maxCorrections constraint.

C17: ([n], "orga", "nic", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C17
[n]
 |
[i]
"organ" "ic" 1 1 16 Match

C18: ([i], "organ", "ic", 1, 0.7014 * 1, 16 * 0.7014 * 1)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

Other GrowOps will violate maxCorrections constraint.

C18: ([i], "organ", "ic", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C18
[i]
 |
[c]
"organi" "c" 1 1 16 Match

C19: ([c], "organi", "c", 1, 0.7014 * 1, 16 * 0.7014 * 1)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

Other GrowOps will violate maxCorrections constraint.

C19: ([c], "organi", "c", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C19
[c]
 |
EOW
"organic" "" 1 1 16 Match

C20: (EOW, "organic", "", 1, 0.7014 * 1, 16 * 0.7014 * 1)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

Other GrowOps will violate maxCorrections constraint.

C20: (EOW, "organic", "", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20
 EOW
"organic" "" This is a grown suggestion.

C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C3 We can't grow without violating maxCorrections constraint.

C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C5 We can't grow without violating maxCorrections constraint.

C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C9 We can't grow without violating maxCorrections constraint.

C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C13 We can't grow without violating maxCorrections constraint.

C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C15 We can't grow without violating maxCorrections constraint.

C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C16 We can't grow without violating maxCorrections constraint.

C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C7 We can't grow without violating maxCorrections constraint.

C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C2 We can't grow without violating maxCorrections constraint.

C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C11
[r]
 |
[e]
"ogr" "fanic" 1 1 10 Match

C21: ([e], "ogr", "fanic", 1, 0.3 * 1, 10 * 0.3 * 1)

C20: (organic, 11.22)

Other GrowOps will violate maxCorrections constraint.

C21: ([e], "ogr", "fanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C21 We can't grow without violating maxCorrections constraint.

C20: (organic, 11.22)

So we come up with a suggestion: organic.

Example user-input "aplen"

[*] The over-simplification is untruthy -- the actual codes do not drop duplicated candidates. Our over-simplification there would matter if the queue ever becomes full again, but as it won't, it's okay for the purposes to simplify the steps here.

Old
Candidate
Potential New Candidate Explanation Candidates Queue
(node, output, rem, cor, multi, weight)
Suggestions Queue
ID TST Node
and
center node
output remaining Cumulative
corrections
multiplier word
frequency
"" "aplen" 0 1 1 The first candidate by getSuggestions() C1: ([a], "", "aplen", 0, 1, 1)
C1
[a]
 |
[p]
"a" "plen" 0 1 16 Match with a weight boost at addCandidate() C2: ([p], "a", "plen", 0, 1, 16 + 100)
"a" "aplen" 1 0.3 16 Insertion-Any

C2: ([p], "a", "plen", 0, 1, 116)
C3: ([p], "a", "aplen", 1, 0.3, 16 * 0.3)

[o]
 |
[r]
"o" "plen" 1 0.2 16 Followed next from [a] to [o]; Substitute-Any

C2: ([p], "a", "plen", 0, 1, 116)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C4: ([r], "o", "plen", 1, 0.2, 16 * 0.2)

"o" "aplen" 1 0.3 16 Insertion-Any

C2: ([p], "a", "plen", 0, 1, 116)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 16 * 0.3)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

[A]
 |
[l]
"A" "plen" 0 0.99 13 Followed next from [o] to [A]; Substitution-Variant. Weight boost at addCandidate()

C2: ([p], "a", "plen", 0, 1, 116)
C6: ([l], "A", "plen", 0, 0.99, 13 * 0.99 + 100)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

"A" "aplen" 1 0.3 13 Insertion-Any

C2: ([p], "a", "plen", 0, 1, 116)
C6: ([l], "A", "plen", 0, 0.99, 112.87)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 13 * 0.3)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C2
[p]
 |
[p]
"ap" "len" 0 1 16 Match with a weight boost at addCandidate()

C8: ([p], "ap", "len", 0, 1, 16 + 100)
C6: ([l], "A", "plen", 0, 0.99, 112.87)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

"ap" "plen" 1 0.3 16 Insertion-Any

C8: ([p], "ap", "len", 0, 1, 116)
C6: ([l], "A", "plen", 0, 0.99, 112.87)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C8
[p]
 |
[l]
"app" "len" 1 0.3 16 Insertion-Any

C6: ([l], "A", "plen", 0, 0.99, 112.87)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

[l]
 |
[e]
"apl" "en" 0 1 7 Followed next from [p] to [l]; Match with a weight boost at addCandidate()

C6: ([l], "A", "plen", 0, 0.99, 112.87)
C11: ([e], "apl", "en", 0, 1, 7 + 100)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

"apl" "len" 1 0.3 7 Insertion-Any

C6: ([l], "A", "plen", 0, 0.99, 112.87)
C11: ([e], "apl", "en", 0, 1, 107)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)
C12: ([e], "apl", "len", 1, 0.3, 7 * 0.3)

C6
[l]
 |
[p]
"Al" "len" 1 0.423 10 Substitution-Near

C11: ([e], "apl", "en", 0, 1, 107)
C13: ([p], "Al", "len", 1, 0.423, 13 * 0.423)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)
C12: ([e], "apl", "len", 1, 0.3, 2.1)

"Al" "plen" 1 0.3 10 Insertion-Any
At this moment, our candidate queue is full and threshold is 2.1.

C11: ([e], "apl", "en", 0, 1, 107)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 13 * 0.3)
C4: ([r], "o", "plen", 1, 0.2, 3.2)
C12: ([e], "apl", "len", 1, 0.3, 2.1)

"Al" "pen" 1 0.3 13 Transposition
This bumps off C12.

C11: ([e], "apl", "en", 0, 1, 107)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

"Al" "en" 1 0.1 13 Try performing Deletion;
since this results in too low a weight, it's not added.

C11: ([e], "apl", "en", 0, 1, 107)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C11
[e]
 |
[n]
"aple" "n" 0 1 7 Match with a weight boost at addCandidate()

C16: ([n], "aple", "n", 0, 1, 7 + 100)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

"aple" "en" 1 0.3 7 Try performing Insertion-Any;
since this results in too low a weight, it's not added.

C16: ([n], "aple", "n", 0, 1, 107)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C16
[n]
 |
[t]
"aplen" "" 0 1 7 Match with a weight boost at addCandidate()

C17: ([t], "aplen", "", 0, 1, 7 + 100)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

"aplen" "n" 1 0.3 7 Try performing Insertion-Any;
since this results in too low a weight, it's not added.

C17: ([t], "aplen", "", 0, 1, 107)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C17
[t]
 |
[y]
"aplent" "" 0 0.4 7 INsertion-Extend, with a weight boost at addCandidate()

C18: ([y], "aplent", "", 0, 0.4, 7 * 0.4 + 100)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C18
[y]
 |
EOW
"aplenty" "" 0 0.4 7 Insertion-Extend, with a smaller weight boost at addCandidate()

C19: (EOW, "aplenty", "", 0, 0.4 * 0.4, 7 * 0.4 * 0.4 + (7 / 32) * 10)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19
 EOW
"aplenty" "" This is a grown suggestion.

C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C13 We can't grow without violating maxCorrections constraint.

C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C3 We can't grow without violating maxCorrections constraint.

C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C5 We can't grow without violating maxCorrections constraint.

C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C9
[p]
 |
[l]
"app" "len" 1 1 16 Match. The resulting candidate is the same as C10 so we're dropping it here [*]

C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

Other GrowOps will violate maxCorrections constraint.

C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C10
[l]
 |
[e]
"appl" "en" 1 1 16 Match

C20: ([e], "appl", "en", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

Other GrowOps will violate maxCorrections constraint.

C20: ([e], "appl", "en", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C20
[e]
 |
EOW
"apple" "n" 1 1 16 Match

C21: ([e], "apple", "n", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

Other GrowOps will violate maxCorrections constraint.

C21: ([e], "apple", "n", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C21 We can't grow without violating maxCorrections constraint.
If we allowed 2 corrections, then we could perform a Deletion-AtEnd here and that would eventually grow into a suggestion, though.

C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C7 We can't grow without violating maxCorrections constraint.

C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C14
[p]
 |
[s]
"Alp" "len" 1 1 13 Match

C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C22: ([s], "Alp", "len", 1, 0.3, 13 * 0.3)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

Other GrowOps will violate maxCorrections constraint.

C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C22: ([s], "Alp", "len", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C15
[p]
 |
[s]
"Alp" "en" 1 1 13 Match

C22: ([s], "Alp", "len", 1, 0.3, 3.9)
C23: ([s], "Alp", "en", 1, 0.3, 13 * 0.3)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

Other GrowOps will violate maxCorrections constraint.

C22: ([s], "Alp", "len", 1, 0.3, 3.9)
C23: ([s], "Alp", "en", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C22 We can't grow without violating maxCorrections constraint.

C23: ([s], "Alp", "en", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C23 We can't grow without violating maxCorrections constraint.

C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C4 We can't grow without violating maxCorrections constraint.

C19: (aplenty, 3.31)

We thus come up with a suggestion: aplenty.

Misc Bits and Technical Details

  • We don't have a "whitelist" of punctuation marks (whitelist = for which insertion GrowOperations don't count as a correction). Rather, one should think of the list of alphabets and their diacritic forms (called variants in the codes) as the graylist for which insertion GrowOperations reduce a growing candidate's weight, and count as corrections.
    • This can lead to unexpected prediction results when the word list contains characters which are not in variants, but which should also not be conceptually treated as whitelisted punctuation marks either. Remember that we whitelist punctuation marks because some languages use them extensively while they may not be always on the default (alphabet) keyboard layout, hence the no-penalty policy.
      • bug 1127776 is an example, where "33" can get corrected to "$US" in French keyboard ("$US" is in the French word list).
      • To avoid such situation in the future, we need to "graylist" a punctuation mark by putting it into rootToAccentedForm (from which variants is derived), as in the bug.
  • As described above, predict() returns to its caller (and the JS event queue), while the prediction process resumes by a setTimeout(). Each processCandidates() call, after processing a specific number of candidates, also returns to the JS event queue and resumes by a setTimeout(). Suggestions may take tens of milliseconds to grow, so these methods are designed to be asynchronous, and we periodically give other code a chance to run. This also gives chance for the prediction process to be abort()'ed.
  • A candidate's weight is artificially boosted at addCandidate() if we have not made any corrections during its growth. This is to ensure that if the user is typing an infrequent word (but typing it correctly), we don't bump the actual word off the list if there are lots of frequent words that have similar spellings. Still, since Insertion-Extend GrowOperation doesn't count as a correction but still makes less than a perfect match, we boost candidates with such GrowOp less by checking the multiplier.
  • We can determine if a priority queue is not full by checking if its threshold is 0 or not. If it's 0, then the queue is not full yet.
    • Threshold is the priority of the lowest-priority item in the queue.

Dictionary Blob Structure

Main article: Gaia/System/Keyboard/IME/Latin/Dictionary_Blob

Storing a plain-text & human-readable dictionary of words and TST is, of course, not optimal for processing by computer programs, in terms of both storage space utilization and computation performance. Thus, the data structures are stored in a binary dictionary blob that offers better efficiency.

The xml2dict.py script processes the word list, generates the TST along with supporting metadata and auxiliary data structures, and serializes it into a compact binary blob, which is written as the *.dict file, to be unserialized and used by the predictions engine.

This procedure is independent to Gaia build process. If you change a word list's XML file, you must manually run xml2dict.py to generate a new corresponding dict file, and then check in the new file to the repository.

Known Issues

See also: Gaia/System/Keyboard/IME/Latin/Dictionary_Blob#Known_Issues

(No contents in this section)

Ideas for Future Improvements

See also: Gaia/System/Keyboard/IME/Latin/Dictionary_Blob#Ideas_for_Future_Improvements

Agglutinative Languages

  • How do we predict for agglutinative languages? See bug 1139255.
    • According to the bug comments, naïve appending a word onto another word in the dictionaries won't work ideally.

Misc

  • bug 908487 may use nodes linked by "next" pointers to know which keys to enlarge.
  • When we check to reject some incoming item in a BoundedPriorityQueue, we naïvely compare the item's priority with the threshold of the queue, in predictions.js.
    • Currently, if the item's priority equals the threshold, the item is immediately rejected.
    • However, under such situation, there might exist other factors for consideration if we're to reject the incoming item, or to actually bump off some item in the queue (whose priority equals the threshold).
    • Such factors might include:
      1. if an item under question is almost fully grown, i.e. remaining is empty, while some other isn't.
      2. if an item under question has less corrections than some other.
      3. ...
    • The idea is that a candidate that has gone through growth process more than a candidate that hasn't might be more capable to become a suggestion (despite both having the same weight); the same for number of corrections made.

Unclarified Questions

  • Is dictionary blob's character table's frequency information not being used by predictions.js?
  • Most of the frequency/weight/priority comparison logics involve floating-point comparison but are coded as naïve comparison operators like <= or >=, which are not the best practice for comparing floating-point numbers.
    • Do we need to employ epsilon-backed comparison? Does it make a difference (in terms of prediction results & quality)? How large is the difference? Is the difference larger/smaller if we also employ the queue-rejection improvement in #Ideas for Future Improvements ?