Gaia/System/Keyboard/IME/Latin/Prediction & Auto Correction

From MozillaWiki
< Gaia‎ | System‎ | Keyboard
Jump to: navigation, search

The Latin IME of Gaia's built-in Keyboard App contains a prediction/auto-correction feature. As its name implies, given a user-input word, it guesses what the user means to type (the "auto-correction" sense), or what the user is planning to type (the "prediction" sense). The feature relies on a dictionary, i.e. a list of words, for a language, and a prediction engine that employs heuristics-based algorithms to match the user input against the dictionary words.

A list of keyboard languages currently supported by Latin IME may be found at L10n:B2G/Fonts.

This article contains introductory materials to the data structures and algorithms upon which the prediction engine is built, and offers illustrative examples where applicable.

Data Structures and Algorithms

Data Structures

To facilitate auto-correction and prediction, our dictionary is represented with a data structure based on Ternary Search Tree. When we generate a TST from a list of words and frequencies of the words, we balance the TST such that at any node, the highest-frequency word is found by following the center pointer. In order to be able to find the next highest-frequency word with the same prefix (i.e. of the same subtree parent), each center child node is a linked list, linked by the "next" pointer, which we may follow.

Source Word List

Currently, the lists of words come from the Android source and is stored in XML format. Each word entry includes a frequency, between 0 and 255, denoting how frequent the word appears in the language. As a special case, a frequency of 0 means profanity -- words that should not be considered a typo, but that should never be suggested explicitly.

The word list is both case- and diacritic-sensitive, and so is the resulting TST blob.

Example TST

Let's take a list of words:

'zila', 'zilian', 'ziliac', 'zart', 'zalarm', 'zset'

And their frequency is:

Word Frequency
zila 255
zilian 175
ziliac 55
zart 215
zalarm 135
zset 95

The TST looks like this:

[z]
 |
[i     ->     a     ->     s]
 |            |            |
[l]          [r  ->  l]   [e]
 |            |      |     |
[a  ->  i]   [t]    [a]   [t]
        |            |
       [a]          [r]
        |            |
       [n  ->  c]   [m]

The vertical up-to-down edge is a center pointer, while a horizontal left-to-right edge denotes the "next" pointer. (The spacing within a horizontal edge doesn't matter, of course). Note that the ordering of [i -> a -> s] in the second level is important, and cannot be something else like [a -> s -> i] in our case, as the words traversable (through center pointers) from 'i' have the highest frequency (zila = 255) than those from 'a' (zart = 215), and in turn, than those from 's' (zset = 95). Similarly, [a -> i] in z-i-l ancestry cannot be [i -> a] as 'zila' is more frequent than 'zilian' and 'ziliac'.

It's worth noting that in real-world scenarios, the concept of "root node" is also a list linked by next pointers -- indeed, not all English words begin with 'z'.

The "Level" Concept

The idea of "center" and "next" pointers can sound very confusing at first glance. The concept of levels can help clarify the idea of "center" and "next" pointers:

  • When we follow the "center" pointer, we advance one level down.
  • When we follow the "next" pointer, we stay at the same level.
  • Level in TST is analogous to character position in a string.

This concept is particularly important because during prediction/auto-correction, we match the input string against the TST, from the beginning of the string and the root of the tree, one character by one character. As level is analogous to character position in the string, when we follow the center pointer one level down, we kind of mean "We believe matching this letter at this character position is pretty good. Let's proceed with the next character.", while when we follow the next pointer, we kind of mean "We believe matching this letter at this character position is not that good. Let's see if matching another letter, at this character position, would be good." -- Then, since the list of "next" nodes are sorted from highest frequency to lowest frequency, we're more likely to hit the "pretty good" candidates before we try the "not that good" candidates.

Now, please remember I've just said "kind of" there. The real matching process is much more complex and the scenario above is very much oversimplified. Please refer to the Algorithms section for the real thing.

Prediction Algorithms

The prediction algorithm is mostly implemented in predictions.js. However, latin.js does some post-processing on the suggestions returned from the prediction engine, in handleSuggestions(), and it's important to understand the underlying dependency the post-processing procedures may have with the prediction algorithms.

Big Picture & Terminology

Note: This sub-section omits many details in order to keep the introduction streamlined.

Terminology is in bold text. I strive to map this Wiki documentation with the terminology in the codes with bug 1124150.

The prediction mechanism works by finding loose matches for the user's input. The hard part of this process is getting that "loose" part right --- we employ heuristics where we see if slight changes to the user's input can transform the input into a known word, and if so, we say the known word is a loose match for the user input.

The goal for the prediction/matching process is to return a number of suggestions which are possible correct spellings of the user input. We generate such suggestions by growing candidates iteratively.

A candidate is a collective information of:

  1. A prefix of a dictionary word.
  2. (Deriving from last point) a specific TST node as we get by traversing from TST root node down a specific levels of center nodes, and optionally moving horizontally with next pointers within a level.
  3. A suffix of the user input.
  4. And other auxiliary information, such as the weight of this candidate, detailed in sub-sections below.

At each iteration, we generate new candidates:

  1. Let's colloquially call the current candidate's prefix as the "current prefix".
  2. There are a few dictionary word prefixes, represented by TST nodes that may be reachable from the current candidate's TST node.
    • Let's also colloquially call them "potential new prefixes".
    • Keep in mind that "reachable" includes both the center and the next pointers (details in later sections).
    • These potential new prefixes have the potential to become new individual candidates.
  3. Essentially, the potential new prefix differs from the current prefix by only one character. It's either a new char appending to the current prefix, or a changed char of the last char of the current prefix.
  4. The potential new prefix can only become a new candidate if such difference is allowed by GrowOperations, a set of predefined rules for the matching process.
    • The GrowOperations determines such eligibility by considering the nature of the new/changed character, the suffix of the user input, and the auxiliary information.
    • Usually, only the first char of the user input suffix is considered, since there is only one char difference between the current prefix and the potential new prefix.
    • And only those potential new fixes that fit the GrowOperation will become new candidates to be further grown in the next iteration.

In the TST example above, if the current candidate is "zar", then the "potential new prefixes" may be "zart", and "zal".

Usually, we go down center nodes. So, the resulting candidate's prefix is usually one char longer than the previous candidate's prefix, and the resulting candidate's user input suffix is usually one char shorter than the previous candidate's suffix. When we finally run out of the user input suffix, the final candidates are sent back to Latin IME as suggestions.

During this matching process, we associate a weight with each candidate we're considering. A weight of a candidate is a collective product of that candidate's frequency value (as stored in its TST node), and how many and what kinds of GrowOperations we have had done during the growth of the candidate to make it look like the user input. We'll introduce the weighting detail later in this section. To avoid exponentially growing our search space, candidates and suggestions are stored in a bounded priority queue, their weight being the priority. When we go from one iteration to another iteration, we always retrieve the highest-priority candidates from the queue and grow candidates from the queue. Low-priority candidates are dropped when the queue is full.

Generally speaking, we have a few kinds of heuristic GrowOperations:

GrowOp Description Example
Insertion

The last char of the potential new prefix is not the first char of the user input suffix. In the next iteration, the user input suffix remains the same.

  • Dict word: "zarca"
  • User input: "zaca"
  • Old prefix: "za"
  • User input suffix: "ca"
  • New prefix: "zar"
  • Next-iteration user input suffix: "ca"
Deletion

The last char of the potential new prefix is the second char of the user input suffix; we 'delete' the first char of the user input suffix by dropping the first & second chars in the user input suffix to form the next iteration's suffix.

Note that the candidate's prefix in the next iteration will contain the user input suffix second char in this iteration, so we drop two chars here.

  • Dict word: "zaar"
  • User input "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "za"
  • Next-iteration user input suffix: "r"
Transposition

Conceptually, this is "the last char of the potential new prefix is the second char of the user input suffix, and the center node of the this TST node contains the first char of the user input suffix".

However, in reality, we are not checking the "and" part. We merely allow the potential new prefix to become the candidate's new prefix in the next iteration, and drop the first char of the user input suffix. In the next iteration, if the first char of the user input suffix (which is also the first char of that in this current iteration) can't be matched against the new prefix's center node character without expensive GrowOperations, the grown candidate will be likely discarded according to weight/correction limit as detailed below.

  • Dict word: "zaacr"
  • User input: "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "zaa"
  • Next-iteration user input suffix: "cr"

Substitution

We just drop the user input suffix's first char.

  • Dict word: "zaxar"
  • User input "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "zax"
  • Next-iteration user input suffix: "ar"

Additionally, there is indeed a "non-heuristic" GrowOperation:

GrowOp Description Example
Match

The last char of the potential new prefix is the first char of the user input suffix.

  • Dict word: zacry
  • User input "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "zar"
  • Next-iteration user input suffix: "ar"

In addition to the weighting mechanism, the prediction engine also imposes a maximum number of corrections that can be made for a user input to match a specific suggestion. Most of the GrowOperations count as corrections, but some specific GrpwOperations (such as substituting a character with its diacritic variant or with its capitalized variant) don't. Note we're talking about suggestions here -- a candidate growing into a new candidate retains the information of the number of corrections we have had to make to match the user input. During the process, when we need to make more corrections than allowed for the user input to match a specific candidate, we drop that candidate from the queue right away.

Now back to the weight of the candidates. As said above, this weight is based on the word frequency of the candidate's TST node, but it is reduced when we make GrowOperations to the user's input, to reflect our decreased confidence in making such suggestion. In the codes, we define a few weight multipliers when we make various kinds of GrowOperations. Such multipliers changes a candidate's weight as it grows and as we go through the matching process. Generally speaking, substituting variant forms and inserting apostrophes does not reduce the weight by much at all. Substituting a nearby key reduces the weight moderately (depending on how near the keys are on the screen). Insertions, deletions and transpositions reduce the weight significantly. A character match, of course, does not reduce the weight at all.

The prediction engine holds nearbyKeys structure which contains spatial information of keys -- which keys are near each other, and how near they are. Such information is provided by the Latin IME, subsequently provided by Keyboard App's layout rendering management logics. As said above, the spatial information is used to weigh substitutions.

Finally, the weight of a fully-grown candidate is the weight of the suggestion that the candidate has grown into. Or rather, we can say it's the confidence on the suggestion that the prediction engine has, toward a specific user input.

Function Call Breakdown

We're not doing a line-by-line code walkthrough here. This is an illustration of: which function calls which, and which does what when called; but without a few technical subtleties; it is meant to paint (only) the overall picture of the functionality/responsibility of each function.

predict() and getSuggestions()

predict() is the called by Latin IME and is the commencement of the whole prediction process. Here, we're provided with the user input, the max number of suggestions, candidates, and corrections for each suggestion. The number of suggestions and candidates limit our search space, and it should be no surprise that the number of the candidates should be larger than that of the suggestions. The max number of corrections limits the difference between the user input and the suggestions, and we usually allow only 1 or 2 corrections.

This function synchronously set up some data structures before triggering getSuggestions() asynchronously. getSuggestions() may be regarded as "step 2" of predict() and is only executed once in the whole prediction process. Here, we check for early-return cases (where the user input contains chars that aren't even in the dictionary, or the user input is longer than the longest word in the dictionary). Then, we begin the growing process by adding the first candidate into the candidate queue and call processCandidates().

And that first candidate is: an empty prefix, and the whole user input string as the suffix; and that's at the root node of the TST. As explained below, we go through the list of next-pointed nodes in an iteration, so we'll get some potential new prefixes with some next nodes of the root node (recall that all these next nodes are at the same level as the root node) that fit some GrowOperations for the first char of the user input.

The Growing Process

As outlined in the previous sub-section, the growing process heuristically GrowOperates a candidate and evaluate such GrowOperations against the user input. It's called growing because we carry out this process character by character, from an empty string (= TST root node) to a complete dictionary word (= as we level-down to a terminal node).

The growing process consists of addCandidate(), processCandidates() and process() cycles. As we've just called addCandidate() at getSuggestions(), let's begin with processCandidates().

processCandidates() iterates through candidates in the queue, and calls process() on each candidate. For each candidate, process() GrowOperates the prefix to match the user input suffix & generate new candidates according to various pieces of information, and calls addCandidate() to add the new candidates to the queue; or, it calls addSuggestion() if a candidate has grown into a suggestion. Not surprisingly, the new candidates will be processed in the next time processCandidates() retrieves them from the queue.

Aside from the candidate information said in big picture sub-section, process() uses these auxiliary information pieces to GrowOperate on candidates:

  1. The number of corrections we have made during the growth of the candidate, as we can't make more corrections than max allowed.
  2. The TST node's word frequency and some other weight information (details below in process() and addCandidate()).
  3. The character of the TST node, of its next nodes, and of the center nodes of this TST node and its next nodes; this affects how we weigh GrowOperations.
  4. The difference of the length of the user input and the levels-down of the TST node (details in the list of growOperations below).

The growing process is heuristic, and it halts when we don't have any more candidates to grow in processCandidates(), or when the highest-priority candidate has a weight too low.

Again, when a candidate's growing process has just started, its prefix is an empty string, and the suffix is the user input. When it ends (and we find some suggestion), the prefix is the suggestion that the candidate has grown into, and the suffix is a possibly empty string.

processCandidates()

We retrieve a certain number of candidates from the queue. As the queue is a priority queue, we always retrieve the highest-priority candidate on each queue remove() call.

As said above, the heuristic process halts when we don't have any more candidate from the queue, or when the weight of the highest-priority candidate is too low. "Too low" means the candidate's weight is lower than the least-priority suggestion in the suggestion queue (i.e. suggestions.threshold), which means the candidate will have no chance to grow into a suggestion that can be inserted into the suggestion queue.

The heuristic process halts by processCandidates() returning without calling setTimeout(processCandidates) to re-trigger itself.

Note that the idea of "one" processCandidates() call does not map to the idea of "one" growth iteration for candidates. The number of candidates to retrieve in each processCandidates() call is a fixed constant, only relevant to UX responsiveness (how frequently we want to relinquish our execution back to the JS engine's event queue).

addCandidate()

A candidate along with its related information is added to the candidate queue here. The information, as process() relies on and as briefly listed above, formally includes (in function argument's order):

  • pointer: The TST node the candidate's prefix is associated with.
  • output: The (current) prefix, that is growing. This will loosely resemble, if not match, a prefix of the user input.
  • remaining: The user input suffix, i.e. the remaining part of the input we haven't attempted to match.
  • multiplier: The multiplicative product of the multipliers from different GrowOperations we've performed during the growth of the candidate.
  • frequency: The candidate node's word frequency.
  • corrections: The number of corrections we have made during the growth of the candidate.

We may also apply a couple weight boost mechanisms here.

process()

process() is the core of the growing process. It performs various GrowOperations on one candidate, and calls addCandidate() with the results; in our words, we say the candidate has gone through one growth iteration. If we have reached the end of the user input, then the candidate is supplied to addSuggestion to add to the suggestion queue.

In order to see what we may potentially GrowOperate the candidate into, process() goes through the nodes linked by the next pointers of the candidate's associated TST node (and the TST node itself), to form various "potential new prefixes". Let's see a quick example of a single iteration.

  1. Consider the TST in the beginning of the article. Suppose the user input is "zola".
  2. Suppose at the previous iteration, we added a candidate of: prefix "z", user input suffix "ola", with the Match GrowOpeation. From the previous iteration to this iteration, we go down the TST level by one and we're now at the [i -> a -> s] level. The node is 'i'.
  3. At the current iteration, output is "z". remaining is "ola". We're at the "i" node. At this moment, we want to loop through all the next-nodes for 'i', i.e. 'i', 'a', 's', and see if the GrowOperations work out on the candidate.
  4. As it turns out, we'll only have:
    • Three substitution GrowOperations, which results in three new candidates with prefix 'zi', 'za', 'zs'; and all have suffix 'la'. The TST nodes associated with the candidates are different subtrees from 'i', 'a', and 'z'.
      • If we're specifically talking about QWERTY keyboard, then, within these substitutions, "zi" will weigh more than other substitutions, as 'i' and 'o' are nearby.
    • Three insertion GrowOperations, which results in three new candidates with prefix 'zi', 'za', 'zs'; and all have suffix 'ola'. The TST nodes associated with the candidates are different subtrees from 'i', 'a', and 'z'.
  5. Finally, if we use default weighting and maxCorrection parameters (which is 1), for all the six new candidates, only the substitution Grow Operation for 'i' will actually fully grow into a suggestion. Indeed, our tree doesn't have other dictionary words that are less than two corrections from "zola".

A list of GrowOperations will be introduced in a later section, including their weight multipliers, whether they count as corrections, and so on.

In each process() call, we may try more than one GrowOperations for a candidate (and call addCandidate() for each applicable GrowOperation result).

It's worth noting that not all GrowOperations are mutually exclusive, and for those who are, we always check if a specific GrowOperation is feasible first before checking another. This is codified in process(). Some GrowOperations are coded within a if-then-elseif-else chain, while being associative with other GrowOperations who themselves form a separate if-then-else chain. Note that the order of GrowOperations within one if-then-else chain is important: We always test for the higher multiplier one first. Still others put a continue statement that bypasses all remaining GrowOperations, for this next node of this candidate.

addSuggestion()

Similar to addCandidate(), addSuggestion() adds the grown suggestion to the suggestion queue. A few subtleties exist:

  1. If the user input has its first letter capitalized, we capitalize such the first letter in the suggestion too.
    • Note that the dictionary defines words of different capitalization as different words, possibly with different weights. However, there are some situations where we might want the first letter of a word to be capitalized, while contextually the word should be regarded as a lowercase word in the dictionary. A good example is the beginning of a sentence.
  2. We may get duplicated suggestions, grown from different candidates. We keep the higher-priority one, for sure.

List of GrowOperations

This section details each GrowOperation, including whether it counts as a correction and the associated multiplier. You may find the GrowOp names in actual codes.

GrowOp Description Correction? Multiplier Example
Match

The last char of the potential new prefix is the first char of the user input suffix.

No

1

  • Dict word: zacry
  • User input "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "zar"
  • Next-iteration user input suffix: "ar"
Insertion-Punctuation

The last char of the potential new prefix is a punctuation character (an apostrophe, for example), and is not the first char of the user input suffix. In the next iteration, the user input suffix remains the same.

No

PUNCTUATION_INSERTION_MULTIPLIER
= 0.95

  • Dict word: "za'ca"
  • User input: "zaca"
  • Old prefix: "za"
  • User input suffix: "ca"
  • New prefix: "za'"
  • Next-iteration user input suffix: "ca"
Insertion-Any

(This is the "regular" Insertion GrowOp as explained in the #Big Picture & Terminology section.)

Yes

INSERTION_MULTIPLIER
= 0.3

  • Dict word: "zarca"
  • User input: "zaca"
  • Old prefix: "za"
  • User input suffix: "ca"
  • New prefix: "zar"
  • Next-iteration user input suffix: "ca"
Insertion-Extend

The user input suffix is empty, but the TST node is not a terminal node. We effectively append the TST node's char to generate the new prefix.

No

WORD_EXTENSION_MULTIPLIER
= 0.4

  • Dict word: "zarcat"
  • User input: "zarc"
  • Old prefix: "zarc"
  • User input suffix: ""
  • New prefix: "zarca"
  • Next-iteration user input suffix: ""
Deletion

(This is the "regular" Deletion GrowOp as explained in the #Big Picture & Terminology section.)

Yes

DELETION_MULTIPLIER
= 0.1

  • Dict word: "zaar"
  • User input "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "za"
  • Next-iteration user input suffix: "r"
Deletion-AtEnd

This is mostly the same as the regular deletion except that it's specifically for when we arrive at the terminal node while still having one character at the user input suffix. The rationale is the same: think that the user input suffix's second char is "nothing", just as we have no center pointer for this node. The code logics for the regular Deletion GrowOp would fail, hence the special case.

Yes

DELETION_MULTIPLIER
= 0.1

  • Dict word: "zaar"
  • User input "zaare"
  • Old prefix: "zaar"
  • User input suffix: "e"
  • New prefix: "zaar"
  • Next-iteration user input suffix: ""
Transposition

(Same as explained in the #Big Picture & Terminology section.)

Yes

TRANSPOSITION_MULTIPLIER
= 0.3

  • Dict word: "zaacr"
  • User input: "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "zaa"
  • Next-iteration user input suffix: "cr"
Substitution-Any

(This is the "regular" Substitution GrowOp as explained in the #Big Picture & Terminology section.)

Yes

SUBSTITUTION_MULTIPLIER
= 0.2

(QWERTY KEYBOARD)

  • Dict word: "zaxar"
  • User input "zapar"
  • Old prefix: "za"
  • User input suffix: "par"
  • New prefix: "zax"
  • Next-iteration user input suffix: "ar"
Substitution-Variant

This is a special Substitution case: The last char of the potential new prefix is a diacritic form of the user input's first char, and/or they differ in capitalization. The diacritic part is unidirectional: we perform this GrowOp if the dict word char is a diacritic and the user input char is "plain", but not if the dict word char is "plain" and the user input char is a diacritic.

No

VARIANT_FORM_MULTIPLIER
= 0.99

  • Dict word: "zaçar"
  • User input "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "zaç"
  • Next-iteration user input suffix: "ar"
Substitution-Near

This is a special Substitution case where the last char of the potential new prefix is near the user input's first char as dictated by nearbyKeys.

Yes

Whichever is larger:
Nearness
or
SUBSTITUTION_MULTIPLIER
= 0.2

(QWERTY KEYBOARD)

  • Dict word: "zaxar"
  • User input "zacar"
  • Old prefix: "za"
  • User input suffix: "car"
  • New prefix: "zax"
  • Next-iteration user input suffix: "ar"

Nearness

The nearByKeys structure is generated by latin.js at nearbyKeys(). The following example list records nearby keys to 'g', along with their nearness, in QWERTY EN-US layout, obtained from FxOS running on Flame.

f: 0.701416015625
h: 0.701416015625
v: 0.49862291921977125
t: 0.423379813234514
y: 0.423379813234514
b: 0.29144229503160113
c: 0.29144229503160113
r: 0.19181633764327974
u: 0.19181633764327974
d: 0.17535400390625
j: 0.17535400390625

Note that in the actual codes, the concept is usually written as "distance", but we're actually talking about the degree of closeness since the value is directly used as a multiplier when doing the Subsitution GrowOp.

Step-by-Step Examples

In this section, we give a few prediction examples for a couple user input words, against a tiny TST tree. This includes candidates' growth iterations and the status of the candidates/suggestions queue. Since our word list is very short, we limit to our candidate queue's capacity to 10 and suggestion queue to 2, in order to demonstrate the idea of bounded priority queue and those early return scenarios. We allow a max of 1 correction.

These illustrative examples do not include all GrowOperation instances.

We'll be operating on a regular QWERTY keyboard; Distance information is retrieved from FxOS running on Flame.

And here is our list of words and frequency, directly excerpted form our en_us wordlist:

Word Frequency Normalized Frequency
Alps 97 13
aplenty 55 7
apple 107 14
apply 127 16
ogre 77 10
oral 119 15
organic 123 16

Which gives this TST:

[a       ->       o       ->       A]
 |                |                |
[p]              [r    ->    g]   [l] 
 |                |          |     |
[p    ->    l]   [g -> a]   [r]   [p]
 |          |     |    |     |     |
[l]        [e]   [a]  [l]   [e]   [s]
 |          |     |
[y -> e]   [n]   [n]
            |     |
           [t]   [i]
            |     |
           [y]   [c]
Example user-input "apple"

A set of rowspan'ed rows denotes one growth iteration of a specific old candidate, possibly into more-than-one new candidates. We artificially ID candidates for the sake of clarity for this documentation. A newly added candidate is in bold face in the Candidate Queue column.

The table is long, so the footnotes are put here:

[†] 16 is the frequency of the most-frequent word that we can traverse (downwards) from this [a], corresponding to "apply".

[‡] Nearness is used as multiplier.

[*] EOW denotes a terminal node. It's omitted from the tree above, but provides illustrative assistance to determine the end of a dictionary word in our steps.

Old
Candidate
Potential New Candidate Explanation Candidates Queue
(node, output, rem, cor, multi, weight)
Suggestions Queue
ID TST Node
and
center node
output remaining Cumulative
corrections
multiplier word
frequency
"" "apple" 0 1 1 The first candidate by getSuggestions() C1: ([a], "", "apple", 0, 1, 1)
C1
[a]
 |
[p]
"a" "pple" 0 1 16 [†] Match with a weight boost at addCandidate() C2: ([p], "a", "pple", 0, 1, 16 + 100)
"a" "apple" 1 0.3 16 Insertion-Any

C2: ([p], "a", "pple", 0, 1, 116)
C3: ([p], "a", "apple", 1, 0.3, 16 * 0.3)

[o]
 |
[r]
"o" "pple" 1 0.2 16 Followed next from [a] to [o]; Substitute-Any

C2: ([p], "a", "pple", 0, 1, 116)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C4: ([r], "o", "pple", 1, 0.2, 16 * 0.2)

"o" "apple" 1 0.3 16 Insertion-Any

C2: ([p], "a", "pple", 0, 1, 116)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 16 * 0.3)
C4: ([r], "o", "pple", 1, 0.2, 3.2)

[A]
 |
[l]
"A" "pple" 0 0.99 13 Followed next from [o] to [A]; Substitution-Variant. Weight boost at addCandidate()

C2: ([p], "a", "pple", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 13 * 0.99 + 100)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C4: ([r], "o", "pple", 1, 0.2, 3.2)

"A" "apple" 1 0.3 13 Insertion-Any

C2: ([p], "a", "pple", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 13 * 0.3)
C4: ([r], "o", "pple", 1, 0.2, 3.2)

C2
[p]
 |
[p]
"ap" "ple" 0 1 16 Match with a weight boost at addCandidate()

C8: ([p], "ap", "ple", 0, 1, 16 + 100)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)

"ap" "pple" 1 0.3 16 Insertion-Any

C8: ([p], "ap", "ple", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)

"ap" "ple" 1 0.3 16 Transposition of this [p]'s center child, also a [p]

C8: ([p], "ap", "ple", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)

"ap" "le" 1 0.1 16 Deletion of this [p], and go to this [p]'s center child, also a [p]

C8: ([p], "ap", "ple", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C11: ([p], "ap", "le", 1, 0.1, 16 * 0.1)

C8
[p]
 |
[l]
"app" "le" 0 1 16 Match with a weight boost at addCandidate()

C12: ([l], "app", "le", 0, 1, 16 + 100)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C11: ([p], "ap", "le", 1, 0.1, 1.6)

"app" "ple" 1 0.3 16 Insertion-Any
At this moment, our candidate queue is full and threshold is 1.6.

C12: ([l], "app", "le", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C11: ([p], "ap", "le", 1, 0.1, 1.6)

[l]
 |
[e]
"apl" "le" 1 0.423 [‡] 7 Followed next from [p] to [l]; Substitute-Near
This bumps off C11.

C12: ([l], "app", "le", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C14: ([e], "apl", "le", 1, 0.42, 7 * 0.423)

"apl" "ple" 1 0.3 7 Try performing Insertion-Any;
since this results in too low a weight, it's not added.

C12: ([l], "app", "le", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C14: ([e], "apl", "le", 1, 0.42, 2.96)

"apl" "pe" 1 0.3 7 Try performing Transposition;
Since this results in too low a weight, it's not added.

C12: ([l], "app", "le", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C14: ([e], "apl", "le", 1, 0.42, 2.96)

"apl" "e" 1 0.1 7 Try performing Deletion;
Since this results in too low a weight, it's not added.

C12: ([l], "app", "le", 0, 1, 116)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C14: ([e], "apl", "le", 1, 0.42, 2.96)

C12
[l]
 |
[y]
"appl" "e" 0 1 16 Match with a weight boost at addCandidate()

C15: ([y], "appl", "e", 0, 1, 16 + 100)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C14: ([e], "apl", "le", 1, 0.42, 2.96)

"appl" "le" 1 0.3 16 Insertion-Any
This bumps C14 off.

C15: ([y], "appl", "e", 0, 1, 16 + 100)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)

C15
[y]
 |
EOW

[*]

"apply" "" 1 0.2 16 Substitution-Any

C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C4: ([r], "o", "pple", 1, 0.2, 3.2)
C17: (EOW, "apply", "", 1, 0.2, 16 * 0.2)

"apply" "e" 1 0.3 16 Insertion-Any
This bumps either C4 or C17 off, say we bump C4 here.

C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C18: (EOW, "apply", "e", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "apple", 1, 0.3, 3.9)
C17: (EOW, "apply", "", 1, 0.2, 3.2)

[e]
 |
EOW
"apple" "" 0 1 14 Followed next from [y] to [e], Match with a weight boost at addCandidate()
This bumps C17 off.

C19: (EOW, "apple", "", 0, 1, 14 + 100)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C18: (EOW, "apply", "e", 1, 0.3, 4.8)
C7: ([l], "A", "apple", 1, 0.3, 3.9)

"apple" "e" 1 0.3 14 Insertion-Any
This bumps C7 off.

C19: (EOW, "apple", "", 0, 1, 114)
C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)

C19
 EOW
"apple" "" This is a grown suggestion.

C6: ([l], "A", "pple", 0, 0.99, 112.87)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)

C19: (apple, 14)

C6
[l]
 |
[e]
"Al" "ple" 1 0.423 13 Substitution-Near

C21: ([e], "Al", "ple", 1, 0.99 * 0.423, 13 * 0.99 * 0.423)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)

C19: (apple, 14)

"Al" "pple" 1 0.3 13 Insertion-Any

C21: ([e], "Al", "ple", 1, 5.44)
C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 13 * 0.3)

C19: (apple, 14)

C21 We can't grow without violating maxCorrections constraint.

C3: ([p], "a", "apple", 1, 0.3, 4.8)
C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C3 We can't grow without violating maxCorrections constraint.

C5: ([r], "o", "apple", 1, 0.3, 4.8)
C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C5 We can't grow without violating maxCorrections constraint.

C9: ([p], "ap", "pple", 1, 0.3, 4.8)
C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C9
[p]
 |
[l]
"app" "ple" 1 1 16 Match

C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C23: ([l], "app", "ple", 1, 0.3 * 1, 16 * 0.3 * 1)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

Other GrowOps will violate maxCorrections constraint.

C10: ([p], "ap", "ple", 1, 0.3, 4.8)
C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C23: ([l], "app", "ple", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C10
[p]
 |
[l]
"app" "le" 1 1 16 Match

C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C23: ([l], "app", "ple", 1, 0.3, 4.8)
C24: ([l], "app", "le", 1, 0.3 * 1, 16 * 0.3 * 1)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

Other GrowOps will violate maxCorrections constraint.

C13: ([l], "app", "ple", 1, 0.3, 4.8)
C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C23: ([l], "app", "ple", 1, 0.3, 4.8)
C24: ([l], "app", "le", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C13 We can't grow without violating maxCorrections constraint

C16: ([y], "appl", "le", 1, 0.3, 4.8)
C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C23: ([l], "app", "ple", 1, 0.3, 4.8)
C24: ([l], "app", "le", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C16 We can't grow without violating maxCorrections constraint

C17: (EOW, "apply", "e", 1, 0.3, 4.8)
C23: ([l], "app", "ple", 1, 0.3, 4.8)
C24: ([l], "app", "le", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C17 We can't grow without violating maxCorrections constraint

C23: ([l], "app", "ple", 1, 0.3, 4.8)
C24: ([l], "app", "le", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C23 We can't grow without violating maxCorrections constraint

C24: ([l], "app", "le", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C24
[l]
 |
[e]
"appl" "e" 1 1 16 Match

C25: ([e], "appl", "e", 1, 0.3 * 1, 16 * 0.3 * 1)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

Other GrowOps will violate maxCorrections constraint.

C25: ([e], "appl", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C25
[e]
 |
EOW
"apple" "" 1 1 16 Match

C26: (EOW, "apple", "", 1, 0.3 * 1, 16 * 0.3 * 1)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

Other GrowOps will violate maxCorrections constraint.

C26: ([e], "appl", "e", 1, 0.3, 4.8)
C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C26
 EOW
"apple" "" Grown candidate, but the word is already in Suggestion Queue with a larger weight, so do nothing.

C20: (EOW, "apple", "e", 1, 0.3, 4.2)
C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C20 We can't grow without violating maxCorrections constraint.

C22: ([e], "Al", "pple", 1, 0.3, 3.9)

C19: (apple, 14)

C22 We can't grow without violating maxCorrections constraint.

C19: (apple, 14)

So we come up with a suggestion: apple.

Example user-input "orfanic"
Old
Candidate
Potential New Candidate Explanation Candidates Queue
(node, output, rem, cor, multi, weight)
Suggestions Queue
ID TST Node
and
center node
output remaining Cumulative
corrections
multiplier word
frequency
"" "orfanic" 0 1 1 The first candidate by getSuggestions() C1: ([a], "", "orfanic", 0, 1, 1)
C1
[a]
 |
[p]
"a" "rfanic" 1 0.2 16 Substitution-Any C2: ([p], "a", "rfanic", 1, 0.2, 16 * 0.2)
"a" "orfanic" 1 0.3 16 Insertion-Any

C3: ([p], "a", "orfanic", 1, 0.3, 16 * 0.3)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)

[o]
 |
[r]
"o" "rfanic" 0 1 16 Followed next from [a] to [o]; Match with a weight boost at addCandidate()

C4: ([r], "o", "rfanic", 0, 1, 16 + 100)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)

"o" "orfanic" 1 0.3 16 Insertion-Any

C4: ([r], "o", "rfanic", 0, 1, 116)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 16 * 0.3)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)

[A]
 |
[l]
"A" "rfanic" 1 0.2 13 Followed next from [o] to [A]; Substitution-Any

C4: ([r], "o", "rfanic", 0, 1, 116)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C6: ([l], "A", "rfanic", 1, 0.2, 13 * 0.2)

"A" "orfanic" 1 0.3 13 Insertion-Any

C4: ([r], "o", "rfanic", 0, 1, 116)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C7: ([l], "A", "orfanic", 1, 0.3, 13 * 0.3)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)

C4
[r]
 |
[g]
"or" "fanic" 0 1 16 Match

C8: ([g], "or", "fanic", 0, 1, 16 + 100)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)

"or" "rfanic" 1 0.3 16 Insertion-Any

C8: ([g], "or", "fanic", 0, 1, 116)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)

[g]
 |
[r]
"og" "fanic" 1 0.2 10 Followed next from [r] to [g]; Substitution-Near, but g-r distance is 0.1918, which is less than 0.2, so we use 0.2 as multiplier.

C8: ([g], "or", "fanic", 0, 1, 116)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)
C10: ([r], "og", "fanic", 1, 0.2, 10 * 0.2)

"og" "rfanic" 1 0.3 10 Insertion-Any

C8: ([g], "or", "fanic", 0, 1, 116)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 10 * 0.3)
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)
C10: ([r], "og", "fanic", 1, 0.2, 2.0)

C8
[g]
 |
[a]
"org" "anic" 1 0.7014 16 Substitution-Near

C12: ([a], "org", "anic", 1, 0.7014, 16 * 0.7014)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)
C10: ([r], "og", "fanic", 1, 0.2, 2.0)

"org" "fanic" 1 0.3 16 Insertion-Any; Queue is now full, threshold = 2.0.

C12: ([a], "org", "anic", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)
C10: ([r], "og", "fanic", 1, 0.2, 2.0)

[a]
 |
[l]
"ora" "anic" 1 0.2 15 Followed next from [g] to [a]; Substitution-Any
This bumps C10 off.

C12: ([a], "org", "anic", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
C14: ([l], "ora", "anic", 1, 0.2, 15 * 0.2)
C6: ([l], "A", "rfanic", 1, 0.2, 2.6)

"ora" "fanic" 1 0.3 15 Insertion-Any; C6 is bumped off.

C12: ([a], "org", "anic", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 15 * 0.3)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)
C14: ([l], "ora", "anic", 1, 0.2, 3.0)

"ora" "fnic" 1 0.3 15 Transposition; C14 is bumped off.

C12: ([a], "org", "anic", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 15 * 0.3)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

"ora" "nic" 1 0.1 15 Try performing Deletion;
since this results in too low a weight, it's not added.

C12: ([a], "org", "anic", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C12
[a]
 |
[n]
"orga" "nic" 1 1 16 Match

C17: ([n], "orga", "nic", 1, 0.7014 * 1, 16 * 0.7014 * 1)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

Other GrowOps will violate maxCorrections constraint.

C17: ([n], "orga", "nic", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C17
[n]
 |
[i]
"organ" "ic" 1 1 16 Match

C18: ([i], "organ", "ic", 1, 0.7014 * 1, 16 * 0.7014 * 1)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

Other GrowOps will violate maxCorrections constraint.

C18: ([i], "organ", "ic", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C18
[i]
 |
[c]
"organi" "c" 1 1 16 Match

C19: ([c], "organi", "c", 1, 0.7014 * 1, 16 * 0.7014 * 1)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

Other GrowOps will violate maxCorrections constraint.

C19: ([c], "organi", "c", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C19
[c]
 |
EOW
"organic" "" 1 1 16 Match

C20: (EOW, "organic", "", 1, 0.7014 * 1, 16 * 0.7014 * 1)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

Other GrowOps will violate maxCorrections constraint.

C20: (EOW, "organic", "", 1, 0.7014, 11.22)
C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20
 EOW
"organic" "" This is a grown suggestion.

C3: ([p], "a", "orfanic", 1, 0.3, 4.8)
C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C3 We can't grow without violating maxCorrections constraint.

C5: ([r], "o", "orfanic", 1, 0.3, 4.8)
C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C5 We can't grow without violating maxCorrections constraint.

C9: ([g], "or", "rfanic", 1, 0.3, 4.8)
C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C9 We can't grow without violating maxCorrections constraint.

C13: ([r], "org", "fanic", 1, 0.3, 4.8)
C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C13 We can't grow without violating maxCorrections constraint.

C15: ([l], "ora", "fanic", 1, 0.3, 4.5)
C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C15 We can't grow without violating maxCorrections constraint.

C16: ([l], "ora", "fnic", 1, 0.3, 4.5)
C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C16 We can't grow without violating maxCorrections constraint.

C7: ([l], "A", "orfanic", 1, 0.3, 3.9)
C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C7 We can't grow without violating maxCorrections constraint.

C2: ([p], "a", "rfanic", 1, 0.2, 3.2)
C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C2 We can't grow without violating maxCorrections constraint.

C11: ([r], "og", "rfanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C11
[r]
 |
[e]
"ogr" "fanic" 1 1 10 Match

C21: ([e], "ogr", "fanic", 1, 0.3 * 1, 10 * 0.3 * 1)

C20: (organic, 11.22)

Other GrowOps will violate maxCorrections constraint.

C21: ([e], "ogr", "fanic", 1, 0.3, 3.0)

C20: (organic, 11.22)

C21 We can't grow without violating maxCorrections constraint.

C20: (organic, 11.22)

So we come up with a suggestion: organic.

Example user-input "aplen"

[*] The over-simplification is untruthy -- the actual codes do not drop duplicated candidates. Our over-simplification there would matter if the queue ever becomes full again, but as it won't, it's okay for the purposes to simplify the steps here.

Old
Candidate
Potential New Candidate Explanation Candidates Queue
(node, output, rem, cor, multi, weight)
Suggestions Queue
ID TST Node
and
center node
output remaining Cumulative
corrections
multiplier word
frequency
"" "aplen" 0 1 1 The first candidate by getSuggestions() C1: ([a], "", "aplen", 0, 1, 1)
C1
[a]
 |
[p]
"a" "plen" 0 1 16 Match with a weight boost at addCandidate() C2: ([p], "a", "plen", 0, 1, 16 + 100)
"a" "aplen" 1 0.3 16 Insertion-Any

C2: ([p], "a", "plen", 0, 1, 116)
C3: ([p], "a", "aplen", 1, 0.3, 16 * 0.3)

[o]
 |
[r]
"o" "plen" 1 0.2 16 Followed next from [a] to [o]; Substitute-Any

C2: ([p], "a", "plen", 0, 1, 116)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C4: ([r], "o", "plen", 1, 0.2, 16 * 0.2)

"o" "aplen" 1 0.3 16 Insertion-Any

C2: ([p], "a", "plen", 0, 1, 116)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 16 * 0.3)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

[A]
 |
[l]
"A" "plen" 0 0.99 13 Followed next from [o] to [A]; Substitution-Variant. Weight boost at addCandidate()

C2: ([p], "a", "plen", 0, 1, 116)
C6: ([l], "A", "plen", 0, 0.99, 13 * 0.99 + 100)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

"A" "aplen" 1 0.3 13 Insertion-Any

C2: ([p], "a", "plen", 0, 1, 116)
C6: ([l], "A", "plen", 0, 0.99, 112.87)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 13 * 0.3)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C2
[p]
 |
[p]
"ap" "len" 0 1 16 Match with a weight boost at addCandidate()

C8: ([p], "ap", "len", 0, 1, 16 + 100)
C6: ([l], "A", "plen", 0, 0.99, 112.87)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

"ap" "plen" 1 0.3 16 Insertion-Any

C8: ([p], "ap", "len", 0, 1, 116)
C6: ([l], "A", "plen", 0, 0.99, 112.87)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C8
[p]
 |
[l]
"app" "len" 1 0.3 16 Insertion-Any

C6: ([l], "A", "plen", 0, 0.99, 112.87)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

[l]
 |
[e]
"apl" "en" 0 1 7 Followed next from [p] to [l]; Match with a weight boost at addCandidate()

C6: ([l], "A", "plen", 0, 0.99, 112.87)
C11: ([e], "apl", "en", 0, 1, 7 + 100)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

"apl" "len" 1 0.3 7 Insertion-Any

C6: ([l], "A", "plen", 0, 0.99, 112.87)
C11: ([e], "apl", "en", 0, 1, 107)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)
C12: ([e], "apl", "len", 1, 0.3, 7 * 0.3)

C6
[l]
 |
[p]
"Al" "len" 1 0.423 10 Substitution-Near

C11: ([e], "apl", "en", 0, 1, 107)
C13: ([p], "Al", "len", 1, 0.423, 13 * 0.423)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)
C12: ([e], "apl", "len", 1, 0.3, 2.1)

"Al" "plen" 1 0.3 10 Insertion-Any
At this moment, our candidate queue is full and threshold is 2.1.

C11: ([e], "apl", "en", 0, 1, 107)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 13 * 0.3)
C4: ([r], "o", "plen", 1, 0.2, 3.2)
C12: ([e], "apl", "len", 1, 0.3, 2.1)

"Al" "pen" 1 0.3 13 Transposition
This bumps off C12.

C11: ([e], "apl", "en", 0, 1, 107)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

"Al" "en" 1 0.1 13 Try performing Deletion;
since this results in too low a weight, it's not added.

C11: ([e], "apl", "en", 0, 1, 107)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C11
[e]
 |
[n]
"aple" "n" 0 1 7 Match with a weight boost at addCandidate()

C16: ([n], "aple", "n", 0, 1, 7 + 100)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

"aple" "en" 1 0.3 7 Try performing Insertion-Any;
since this results in too low a weight, it's not added.

C16: ([n], "aple", "n", 0, 1, 107)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C16
[n]
 |
[t]
"aplen" "" 0 1 7 Match with a weight boost at addCandidate()

C17: ([t], "aplen", "", 0, 1, 7 + 100)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

"aplen" "n" 1 0.3 7 Try performing Insertion-Any;
since this results in too low a weight, it's not added.

C17: ([t], "aplen", "", 0, 1, 107)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C17
[t]
 |
[y]
"aplent" "" 0 0.4 7 INsertion-Extend, with a weight boost at addCandidate()

C18: ([y], "aplent", "", 0, 0.4, 7 * 0.4 + 100)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C18
[y]
 |
EOW
"aplenty" "" 0 0.4 7 Insertion-Extend, with a smaller weight boost at addCandidate()

C19: (EOW, "aplenty", "", 0, 0.4 * 0.4, 7 * 0.4 * 0.4 + (7 / 32) * 10)
C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19
 EOW
"aplenty" "" This is a grown suggestion.

C13: ([p], "Al", "len", 1, 0.423, 5.5)
C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C13 We can't grow without violating maxCorrections constraint.

C3: ([p], "a", "aplen", 1, 0.3, 4.8)
C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C3 We can't grow without violating maxCorrections constraint.

C5: ([r], "o", "aplen", 1, 0.3, 4.8)
C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C5 We can't grow without violating maxCorrections constraint.

C9: ([p], "ap", "plen", 1, 0.3, 4.8)
C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C9
[p]
 |
[l]
"app" "len" 1 1 16 Match. The resulting candidate is the same as C10 so we're dropping it here [*]

C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

Other GrowOps will violate maxCorrections constraint.

C10: ([l], "app", "len", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C10
[l]
 |
[e]
"appl" "en" 1 1 16 Match

C20: ([e], "appl", "en", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

Other GrowOps will violate maxCorrections constraint.

C20: ([e], "appl", "en", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C20
[e]
 |
EOW
"apple" "n" 1 1 16 Match

C21: ([e], "apple", "n", 1, 0.3, 16 * 0.3)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

Other GrowOps will violate maxCorrections constraint.

C21: ([e], "apple", "n", 1, 0.3, 4.8)
C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C21 We can't grow without violating maxCorrections constraint.
If we allowed 2 corrections, then we could perform a Deletion-AtEnd here and that would eventually grow into a suggestion, though.

C7: ([l], "A", "aplen", 1, 0.3, 3.9)
C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C7 We can't grow without violating maxCorrections constraint.

C14: ([p], "Al", "plen", 1, 0.3, 3.9)
C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C14
[p]
 |
[s]
"Alp" "len" 1 1 13 Match

C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C22: ([s], "Alp", "len", 1, 0.3, 13 * 0.3)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

Other GrowOps will violate maxCorrections constraint.

C15: ([p], "Al", "pen", 1, 0.3, 3.9)
C22: ([s], "Alp", "len", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C15
[p]
 |
[s]
"Alp" "en" 1 1 13 Match

C22: ([s], "Alp", "len", 1, 0.3, 3.9)
C23: ([s], "Alp", "en", 1, 0.3, 13 * 0.3)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

Other GrowOps will violate maxCorrections constraint.

C22: ([s], "Alp", "len", 1, 0.3, 3.9)
C23: ([s], "Alp", "en", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C22 We can't grow without violating maxCorrections constraint.

C23: ([s], "Alp", "en", 1, 0.3, 3.9)
C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C23 We can't grow without violating maxCorrections constraint.

C4: ([r], "o", "plen", 1, 0.2, 3.2)

C19: (aplenty, 3.31)

C4 We can't grow without violating maxCorrections constraint.

C19: (aplenty, 3.31)

We thus come up with a suggestion: aplenty.

Misc Bits and Technical Details

  • We don't have a "whitelist" of punctuation marks (whitelist = for which insertion GrowOperations don't count as a correction). Rather, one should think of the list of alphabets and their diacritic forms (called variants in the codes) as the graylist for which insertion GrowOperations reduce a growing candidate's weight, and count as corrections.
    • This can lead to unexpected prediction results when the word list contains characters which are not in variants, but which should also not be conceptually treated as whitelisted punctuation marks either. Remember that we whitelist punctuation marks because some languages use them extensively while they may not be always on the default (alphabet) keyboard layout, hence the no-penalty policy.
      • bug 1127776 is an example, where "33" can get corrected to "$US" in French keyboard ("$US" is in the French word list).
      • To avoid such situation in the future, we need to "graylist" a punctuation mark by putting it into rootToAccentedForm (from which variants is derived), as in the bug.
  • As described above, predict() returns to its caller (and the JS event queue), while the prediction process resumes by a setTimeout(). Each processCandidates() call, after processing a specific number of candidates, also returns to the JS event queue and resumes by a setTimeout(). Suggestions may take tens of milliseconds to grow, so these methods are designed to be asynchronous, and we periodically give other code a chance to run. This also gives chance for the prediction process to be abort()'ed.
  • A candidate's weight is artificially boosted at addCandidate() if we have not made any corrections during its growth. This is to ensure that if the user is typing an infrequent word (but typing it correctly), we don't bump the actual word off the list if there are lots of frequent words that have similar spellings. Still, since Insertion-Extend GrowOperation doesn't count as a correction but still makes less than a perfect match, we boost candidates with such GrowOp less by checking the multiplier.
  • We can determine if a priority queue is not full by checking if its threshold is 0 or not. If it's 0, then the queue is not full yet.
    • Threshold is the priority of the lowest-priority item in the queue.

Dictionary Blob Structure

Main article: Gaia/System/Keyboard/IME/Latin/Dictionary_Blob

Storing a plain-text & human-readable dictionary of words and TST is, of course, not optimal for processing by computer programs, in terms of both storage space utilization and computation performance. Thus, the data structures are stored in a binary dictionary blob that offers better efficiency.

The xml2dict.py script processes the word list, generates the TST along with supporting metadata and auxiliary data structures, and serializes it into a compact binary blob, which is written as the *.dict file, to be unserialized and used by the predictions engine.

This procedure is independent to Gaia build process. If you change a word list's XML file, you must manually run xml2dict.py to generate a new corresponding dict file, and then check in the new file to the repository.

Known Issues

See also: Gaia/System/Keyboard/IME/Latin/Dictionary_Blob#Known_Issues

(No contents in this section)

Ideas for Future Improvements

See also: Gaia/System/Keyboard/IME/Latin/Dictionary_Blob#Ideas_for_Future_Improvements

Agglutinative Languages

  • How do we predict for agglutinative languages? See bug 1139255.
    • According to the bug comments, naïve appending a word onto another word in the dictionaries won't work ideally.

Misc

  • bug 908487 may use nodes linked by "next" pointers to know which keys to enlarge.
  • When we check to reject some incoming item in a BoundedPriorityQueue, we naïvely compare the item's priority with the threshold of the queue, in predictions.js.
    • Currently, if the item's priority equals the threshold, the item is immediately rejected.
    • However, under such situation, there might exist other factors for consideration if we're to reject the incoming item, or to actually bump off some item in the queue (whose priority equals the threshold).
    • Such factors might include:
      1. if an item under question is almost fully grown, i.e. remaining is empty, while some other isn't.
      2. if an item under question has less corrections than some other.
      3. ...
    • The idea is that a candidate that has gone through growth process more than a candidate that hasn't might be more capable to become a suggestion (despite both having the same weight); the same for number of corrections made.

Unclarified Questions

  • Is dictionary blob's character table's frequency information not being used by predictions.js?
  • Most of the frequency/weight/priority comparison logics involve floating-point comparison but are coded as naïve comparison operators like <= or >=, which are not the best practice for comparing floating-point numbers.
    • Do we need to employ epsilon-backed comparison? Does it make a difference (in terms of prediction results & quality)? How large is the difference? Is the difference larger/smaller if we also employ the queue-rejection improvement in #Ideas for Future Improvements ?