Support:Search Requirements: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
(reorganize)
Line 1: Line 1:
THIS DOCUMENT IS A DRAFT OF OUR SEARCH ENGINE REQUIREMENTS FOR SUMO. IT IS NOT YET FINAL.
THIS DOCUMENT IS A DRAFT OF OUR SEARCH ENGINE REQUIREMENTS FOR SUMO. IT IS NOT YET FINAL.


==Results==


# Doesn't kill the server
===Source data===
# Handles spelling errors
Results coming from KB and from forum. (Only forum topics that are marked as answered?)
# Category/tag based searches (only articles in a particular category and/or with the specified tags should be matched [https://bugzilla.mozilla.org/show_bug.cgi?id=401163])
 
#* Does it depend on the user (e.g. show Staging copies to contributors?)
Only show articles from the Knowledge Base category, not administrative, staging, or sandbox articles.
#** Yes, but the important thing is that the search engine accepts searches based on category/tags. Then we can use different search queries depending on user groups. (djst)
 
# Should only look at the content, title, and tags of the article and not other features of the page. Right now, searching for "Bookmarks" shows all articles because "Bookmarks" appears in the tag cloud.
Should only look at the content, title, and tags of the article and not other features of the page.
# "Notice" new or changed content within 24 hours.
 
# Do not return multiple results for the same article:
"Notice" new or changed content within 24 hours.
#* Different capitalization [https://bugzilla.mozilla.org/show_bug.cgi?id=399400]
 
#* Different request parameters
Handle tiki formatting correctly (search for "code" should not return all pages that use the code tag)
# Some formatting issues [https://bugzilla.mozilla.org/show_bug.cgi?id=399517]
 
# Handle localization
===Localization===
#* How?
The locale should be detected (and possible to override) like articles. Only the selected/detected locale should be searched. However, many locales will have incomplete translations, which means it would also list content not localized (using the same locale fallback mechanism as defined in [https://bugzilla.mozilla.org/show_bug.cgi?id=398353]). In summary, a search should return all results for the current locale + any remaining articles in the fallback locales, but it should never list the same article twice, even if it exists for two locales.  
#** The locale should be detected (and possible to override).
 
#** When a search is performed, only the selected/detected locale should be searched. However, many locales will have incomplete translations, which means it would also list content not localized (using the same locale fallback mechanism as defined in [https://bugzilla.mozilla.org/show_bug.cgi?id=398353])
 
#** In summary, a search should return all results for the current locale + any remaining articles in the fallback locales, but it should never list the same article twice, even if it exists for two locales.  
===Ranking===
# Be able to weigh articles
Be able to weigh articles
#* Based on their tags
* Based on relevance?
#* Based on their poll results
* Based on their tags?
#* Based on their page hit count
* Based on their poll results?
# Handle tiki formatting correctly
* Based on their page hit count?
#* Properly handle the use of it in search (search for "code" should not return all pages that use the code tag)
 
#* Don't display wiki source in search results
Every article should only appear once (a single article can be at multiple URLs because of redirects and page parameters).
# Show statistics on the article
 
#* Show popularity and poll results in search results
 
# "More like this"?
==Performance==
#* I personally don't see the benefit (djst)
 
# Be able to show only forum topics that are marked as answered?
Doesn't bring things to a grinding halt. (Quantify)
 
 
==Fudge factor==
 
*Handle spelling errors ("did you mean...").
*Synonyms (searching for "favorites" also searches for "bookmarks")
*Ignores locale-specific common words ("the", "a", "Firefox")
 
 
==Display==
 
*Show the title of the page, the first paragraph, and maybe the text surrounding the text matched.
*Display results as plain text without Tiki formatting
* Show data on the article?
** Popularity
** Poll results
* "More like this"?
** I personally don't see the benefit (djst)

Revision as of 00:38, 13 May 2008

THIS DOCUMENT IS A DRAFT OF OUR SEARCH ENGINE REQUIREMENTS FOR SUMO. IT IS NOT YET FINAL.

Results

Source data

Results coming from KB and from forum. (Only forum topics that are marked as answered?)

Only show articles from the Knowledge Base category, not administrative, staging, or sandbox articles.

Should only look at the content, title, and tags of the article and not other features of the page.

"Notice" new or changed content within 24 hours.

Handle tiki formatting correctly (search for "code" should not return all pages that use the code tag)

Localization

The locale should be detected (and possible to override) like articles. Only the selected/detected locale should be searched. However, many locales will have incomplete translations, which means it would also list content not localized (using the same locale fallback mechanism as defined in [1]). In summary, a search should return all results for the current locale + any remaining articles in the fallback locales, but it should never list the same article twice, even if it exists for two locales.


Ranking

Be able to weigh articles

  • Based on relevance?
  • Based on their tags?
  • Based on their poll results?
  • Based on their page hit count?

Every article should only appear once (a single article can be at multiple URLs because of redirects and page parameters).


Performance

Doesn't bring things to a grinding halt. (Quantify)


Fudge factor

  • Handle spelling errors ("did you mean...").
  • Synonyms (searching for "favorites" also searches for "bookmarks")
  • Ignores locale-specific common words ("the", "a", "Firefox")


Display

  • Show the title of the page, the first paragraph, and maybe the text surrounding the text matched.
  • Display results as plain text without Tiki formatting
  • Show data on the article?
    • Popularity
    • Poll results
  • "More like this"?
    • I personally don't see the benefit (djst)