|
|
| Line 1: |
Line 1: |
| ToDO: Formatting, it is badly formatted. Any idea on how to effectively format code?
| |
| == Overview == | | == Overview == |
|
| |
|
| Line 32: |
Line 31: |
|
| |
|
| The use cases above will be used to validate the design. | | The use cases above will be used to validate the design. |
|
| |
| == Database Design ==
| |
|
| |
| TODO: Check the url_table and put it here. The url_table acts as the document table. The url table will contain additionally the document length
| |
|
| |
| {| border="1" cellpadding="2"
| |
| |+'''Word Table'''
| |
| |-
| |
| !columnn!!type!!bytes!!description
| |
| |-
| |
| |word||varchar||<=100||term for indexing(Shouldn't it be unicode? how do i store unicode?)
| |
| |-
| |
| |wordnum||integer||4||unique id. Integer works cause the number of unique words will be atmost a million. Non-english language?
| |
| |-
| |
| |doc_count||integer||4||number of documents the word occurred in
| |
| |-
| |
| |word_count||integer||4||number of occurrences of the word
| |
| |}
| |
| <br>
| |
| {| border="1" cellpadding="2"
| |
| |+'''Posting Table'''
| |
| |-
| |
| !column!!type!!bytes!!description
| |
| |-
| |
| |wordnum||integer||4||This is the foreign key matching that in the word table
| |
| |-
| |
| |firstdoc||integer||4||lowest doc id referenced in the block
| |
| |-
| |
| |flags||tinyint||1||indicates the block type, length of doc list, sequence number
| |
| |-
| |
| |block||varbinary||<=255||contains encoded document and/or position postings for the word
| |
| |}
| |
|
| |
| To Do
| |
| # We might need a table or two more for ranking efficiently
| |
| # Check if SQLite has varbinary datatype. There is a BLOB data type, I am sure.
| |
|
| |
| Note that the tables structure is subject to change to improve efficiency. New Tables might be formed and/or the table might add/remove column
| |
|
| |
|
| == Detailed Design == | | == Detailed Design == |