Places:Full Text Indexing: Difference between revisions

Jump to navigation Jump to search
m
no edit summary
mNo edit summary
Line 1: Line 1:
ToDO: Formatting, it is badly formatted. Any idea on how to effectively format code?
== Overview ==
== Overview ==


Line 32: Line 31:


The use cases above will be used to validate the design.
The use cases above will be used to validate the design.
== Database Design ==
TODO: Check the url_table and put it here. The url_table acts as the document table. The url table will contain additionally the document length
{| border="1" cellpadding="2"
|+'''Word Table'''
|-
!columnn!!type!!bytes!!description
|-
|word||varchar||<=100||term for indexing(Shouldn't it be unicode? how do i store unicode?)
|-
|wordnum||integer||4||unique id. Integer works cause the number of unique words will be atmost a million. Non-english language?
|-
|doc_count||integer||4||number of documents the word occurred in
|-
|word_count||integer||4||number of occurrences of the word
|}
<br>
{| border="1" cellpadding="2"
|+'''Posting Table'''
|-
!column!!type!!bytes!!description
|-
|wordnum||integer||4||This is the foreign key matching that in the word table
|-
|firstdoc||integer||4||lowest doc id referenced in the block
|-
|flags||tinyint||1||indicates the block type, length of doc list, sequence number
|-
|block||varbinary||<=255||contains encoded document and/or position postings for the word
|}
To Do
# We might need a table or two more for ranking efficiently
# Check if SQLite has varbinary datatype. There is a BLOB data type, I am sure.
Note that the tables structure is subject to change to improve efficiency. New Tables might be formed and/or the table might add/remove column


== Detailed Design ==
== Detailed Design ==
24

edits

Navigation menu