Support:GSOC Project Scope and Timeline
June 25: end of term for GSOC student
June 25: re-connect, follow up meeting
June 26 - June 28: Install Sphinx and sumo on development server.
June 29 - July 4: Develop indexing engine
July 7 - July 10: Develop filtering and weighting engine
July 11 - July 15: Develop search component and search UI
July 16 - July 23: Develop fudge factor improvements
July 24 - July 31: Refinements
Aug 4 - Aug 20: Load testing, UI improvement, caching (these are considered not part of GSOC scope)
Sphinx based - triggered as batch job, access Tiki db directly
Filtering and Weighting Engine
Extended tables based on Sphinx - custom UI for admin to add remove weights.
Weights stored with index for performance reasons.
Replaces Tiki lib/searchlib.php - searches index and returns results based on given parameters
Replaces Tiki tiki-searchindex.php and tiki-searchindex.tpl to provide front-end UI to search
Data will come from knowledge base and forums. The system will be extensible to other Tiki features but this project will only cover kb and forums.
Data searched for will be filterable by:
- kb vs. forums
- by forum thread state (forum threads that are answered)
- by article type (help vs. troubleshooting)
- by category
- by author of article
- by contributors to the forum thread
- by freshness of data (last modified for wiki pages, and last post date for forum threads)
Filtering information will be part of the index, to speed performance.
This is just another type of filtering.
Locale information will be in the index as well.
Searching for "translations of search terms" is beyond the scope of this project.
Returning of search results that include translations based on user defined fallback is beyond the scope of this project.
This will be done based on:
- source type (article vs. forum)
- each source type field can be weighted, e.g. title, vs description.
- existence of search term in freetags
- poll results
Weighting info will be stored in the index for performance reasons.
This will be a batch job.
The last modified fate of an article could be used as a means to speed indexing (avoid unnecessary reindexing).
Indexing should not include tiki syntax.
Need to support for boolean logic in searching for search terms – OR, AND, NOT.
Caching of search results
Need to be done, but not part of GSOC project - to be scheduled separately.
Handle spelling errors ("did you mean...").
Synonyms (searching for "favorites" also searches for "bookmarks")
Ignores locale-specific common words ("the", "a", "Firefox") - this will be limited to English for the scope of this project, but will be extensible.
Display of search results
Show the title of the page, the first paragraph (actually the description field). (the text surrounding the text matched is not in this project)
Display results as plain text without Tiki formatting (description field will not have Tiki formatting)
Show data on the article - such as poll results - will be based on info in index only - to improve performance.
“More like this” is a separate thing and should be considered out of scope of this project.