Support:Search Requirements: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(New page: # Doesn't kill the server # Only articles in a particular category or categories should be displayed as search results (at the moment, only things "Knowledge Base"). Without this, we'll sh...)
 
 
(14 intermediate revisions by 4 users not shown)
Line 1: Line 1:
# Doesn't kill the server
[[Support:GSOC Project Scope and Timeline]]
# Only articles in a particular category or categories should be displayed as search results (at the moment, only things "Knowledge Base"). Without this, we'll show articles that aren't ready or aren't accessible to people.[https://bugzilla.mozilla.org/show_bug.cgi?id=401163]
 
#* Does it depend on the user (e.g. show Staging copies to contributors?)
THIS DOCUMENT IS A DRAFT OF OUR SEARCH ENGINE REQUIREMENTS FOR SUMO. IT IS NOT YET FINAL.
# Should only look at the content and title of the page and not other features of the page. Right now, searching for "Bookmarks" shows all articles because "Bookmarks" appears in the tag cloud.
 
# "Notice" new or changed content within 24 hours.
==Results==
# Do not return multiple results for the same article:
 
#* Different capitalization [https://bugzilla.mozilla.org/show_bug.cgi?id=399400]
===Source data===
#* Different request parameters
Results coming from KB and from forum. (Only forum topics that are marked as answered.)
# Some formatting issues [https://bugzilla.mozilla.org/show_bug.cgi?id=399517]
 
# Handle localization
Only show articles from the Knowledge Base category, not administrative, staging, or sandbox articles.
#* How?
 
# Be able to weight articles
Should only look at the content, title, and tags of the article and not other features of the page.
#* Based on their tags
 
#* Based on their poll results
"Notice" new or changed content within 24 hours.
# Handle tiki formatting correctly
 
#* Properly handle the use of it in search (search for "code" should not return all pages that use the code tag)
Handle tiki formatting correctly (search for "code" should not return all pages that use the code tag)
#* Don't display source in search results
 
#** Parse it?
===Localization===
#** Ignore it?
* A search should be performed on the current locale (as detected already by SUMO and shown in the URL)
# Show statistics on the article
 
#* Popularity
Out of scope for SoC project:
#* Poll results
 
# "More like this"?
*Many locales will have incomplete translations, so it would also list content not localized in en-US
*Logged in users should be able to specify another locale fallback order
* (Future) It should be possible to hard-code another locale fallback for some locales. For example, for the pt-BR locale, falling back to pt-PT first would be better than en-US.
 
In summary, a search should return all results for the current locale + any remaining articles in the fallback locale (en-US), but it should never list the same article twice, even if it exists for both locales.
 
===Ranking===
Be able to weigh articles
* Based on their source (KB articles rank higher than forum threads)
* Based on relevance
* Based on their tags?
* Based on their poll results?
* Based on their page hit count?
 
Every article should only appear once (a single article can be at multiple URLs because of redirects and page parameters).
 
 
==Performance==
 
Doesn't bring things to a grinding halt. (Quantify)
 
 
==Fudge factor==
*Handle partial matches, where most but not all terms are in the article.
 
Out of scope for SoC project:
 
*Handle spelling errors ("did you mean...").
*Synonyms (searching for "favorites" also searches for "bookmarks")
*Ignores locale-specific common words ("the", "a", "Firefox")
 
==Display==
 
*Show the title of the page, the first paragraph, and maybe the text surrounding the text matched.
*Display results as plain text without Tiki formatting
* Show data on the article?
** Popularity
** Poll results
* "More like this"?
** I personally don't see the benefit (djst)
* Advanced search? like AMO has. specifically 'per page' and 'sort by' sections. possibly, last updated, version, and platform ( these could show the relevent SHOWFOR catagories, and Application if/when SUMO expand into Tb and other products.

Latest revision as of 06:56, 27 September 2008

Support:GSOC Project Scope and Timeline

THIS DOCUMENT IS A DRAFT OF OUR SEARCH ENGINE REQUIREMENTS FOR SUMO. IT IS NOT YET FINAL.

Results

Source data

Results coming from KB and from forum. (Only forum topics that are marked as answered.)

Only show articles from the Knowledge Base category, not administrative, staging, or sandbox articles.

Should only look at the content, title, and tags of the article and not other features of the page.

"Notice" new or changed content within 24 hours.

Handle tiki formatting correctly (search for "code" should not return all pages that use the code tag)

Localization

  • A search should be performed on the current locale (as detected already by SUMO and shown in the URL)

Out of scope for SoC project:

  • Many locales will have incomplete translations, so it would also list content not localized in en-US
  • Logged in users should be able to specify another locale fallback order
  • (Future) It should be possible to hard-code another locale fallback for some locales. For example, for the pt-BR locale, falling back to pt-PT first would be better than en-US.

In summary, a search should return all results for the current locale + any remaining articles in the fallback locale (en-US), but it should never list the same article twice, even if it exists for both locales.

Ranking

Be able to weigh articles

  • Based on their source (KB articles rank higher than forum threads)
  • Based on relevance
  • Based on their tags?
  • Based on their poll results?
  • Based on their page hit count?

Every article should only appear once (a single article can be at multiple URLs because of redirects and page parameters).


Performance

Doesn't bring things to a grinding halt. (Quantify)


Fudge factor

  • Handle partial matches, where most but not all terms are in the article.

Out of scope for SoC project:

  • Handle spelling errors ("did you mean...").
  • Synonyms (searching for "favorites" also searches for "bookmarks")
  • Ignores locale-specific common words ("the", "a", "Firefox")

Display

  • Show the title of the page, the first paragraph, and maybe the text surrounding the text matched.
  • Display results as plain text without Tiki formatting
  • Show data on the article?
    • Popularity
    • Poll results
  • "More like this"?
    • I personally don't see the benefit (djst)
  • Advanced search? like AMO has. specifically 'per page' and 'sort by' sections. possibly, last updated, version, and platform ( these could show the relevent SHOWFOR catagories, and Application if/when SUMO expand into Tb and other products.