Confirmed users
764
edits
(→Status) |
(→Status) |
||
| Line 25: | Line 25: | ||
** I chatted with asuth about it, asked how well their tokenizer serves their users and whether he could recommend it to us. | ** I chatted with asuth about it, asked how well their tokenizer serves their users and whether he could recommend it to us. | ||
*** What's in their tree right now is mainly just improvements to CJK tokenization. | *** What's in their tree right now is mainly just improvements to CJK tokenization. | ||
*** There's a pending patch on {{bug|525537}} that handles case and accent folding, but that's only an improvement to the Porter stemmer. Before we (Firefox) even think about stemming, we need to competently support i18n text in the first place. I also think that stemming, while definitely a good thing in general, is more useful for Thunderbird, where they are indexing the entire bodies of emails. We don't currently plan to index bodies of Web pages. | *** There's a pending patch on {{bug|525537}} that handles case and accent folding, but that's only an improvement to the Porter stemmer. Before we (Firefox) even think about stemming, we need to competently support i18n text in the first place. I also think that stemming, while definitely a good thing in general, is more useful for Thunderbird, where they are indexing the entire bodies of emails. We don't currently plan to index bodies of Web pages. I would expect stemming to be less useful for page titles and especially things like URLs, tags, and keywords. | ||
*** He too looked at our libintl library recently for the aforementioned bug and agreed that it's not a featureful or actively developed library to rely on. | *** He too looked at our libintl library recently for the aforementioned bug and agreed that it's not a featureful or actively developed library to rely on. | ||
*** When I asked if they'd considered bringing in ICU or any other outside libraries, he mentioned that Mozilla had shied away from bringing in ICU before. | *** When I asked if they'd considered bringing in ICU or any other outside libraries, he mentioned that Mozilla had shied away from bringing in ICU before. | ||