Exercise 1.2 Consider these documents:
Doc 1 breakthrough drug for schizophrenia
Doc 2 new schizophrenia drug
Doc 3 new approach for treatment of schizophrenia
Doc 4 new hopes for schizophrenia patients
a. Draw the term-document incidence matrix for this document collection.
Doc 1 | Doc 2 | Doc 3 | Doc 4 | |
Approuch | 0 | 0 | 1 | 0 |
Breakthrough | 1 | 0 | 0 | 0 |
Drug | 1 | 1 | 0 | 0 |
For | 1 | 0 | 1 | 1 |
Hopes | 0 | 0 | 0 | 1 |
New | 0 | 1 | 1 | 1 |
Of | 0 | 0 | 1 | 0 |
Patients | 0 | 0 | 0 | 1 |
Schizophrenia | 1 | 1 | 1 | 1 |
Treatment | 0 | 0 | 1 | 0 |
b. Draw the inverted index representation for this collection, as in Figure 1.3 (page 7)
Term | Doc ID |
Breakthrough | 1 |
Drug | 1 |
For | 1 |
Schizophrenia | 1 |
New | 2 |
Schizophrenia | 2 |
Drug | 2 |
New | 3 |
Approach | 3 |
For | 3 |
Treatment | 3 |
Of | 3 |
Schizophrenia | 3 |
New | 4 |
Hopes | 4 |
For | 4 |
Schizophrenia | 4 |
Patients | 4 |
Term | Doc ID |
Approach | 3 |
Breakthrough | 1 |
Drug | 1 |
Drug | 2 |
For | 1 |
For | 3 |
For | 4 |
Hopes | 4 |
New | 2 |
New | 3 |
New | 4 |
Of | 3 |
Patients | 4 |
Schizophrenia | 1 |
Schizophrenia | 2 |
Schizophrenia | 3 |
Schizophrenia | 4 |
Treatment | 3 |
TERM doc |
Freq |
Posting List |
|||
approach |
1 |
3 |
|||
breakthrough |
1 |
1 |
|||
drug |
2 |
1 |
2 |
||
for |
3 |
1 |
3 |
4 |
|
hopes |
1 |
4 |
|
||
new |
3 |
2 |
3 |
4 |
|
of |
1 |
3 |
|
||
patients |
1 |
4 |
|||
schizophrenia |
4 |
1 |
2 |
3 |
4 |
treatment |
1 |
|
Exercise 1.10
Write out a postings merge algorithm, in the style of Figure 1.6 (page 11), for an x OR y query.
exercise 1. 10
INTERSECT (X,Y)
Answer ¬ ( )
While x ≠ NIL OR y ≠ NIL
Do if doc 10 (x) = doc 10 (y)
Then ADD (answer, doc 10 (x))
X ¬ next (x)
Y ¬ next (y)
Else if doc ID (x) < doc 10 (y)
Then x ¬ next (x)
Else y ¬ next (y)
Return answer
Exercise 1.7
Recommend a query processing order for d. (tangerine OR trees) AND (marmalade OR skies) AND (kaleidoscope OR eyes) given the following postings list sizes:
(tangerine OR trees) AND (marmalade OR skies) AND (kaleidoscope OR eyes)
Term Postings size
eyes 213312
kaleidoscope 87009
marmalade 107913
skies 271658
tangerine 46653
trees 316812
trees AND skies AND eyes
Hasil query boolean di Google dan Yahoo maka Query ke google yaitu doraemon and sinchan.
Hasilnya menunjukkan lebih baik dan lengkap di google.com.
Term |
Doc. Freg |
Approach |
1 |
Breakthrough |
1 |
Drug |
2 |
For |
3 |
Hopes |
4 |
New |
3 |
Of |
1 |
Patients |
1 |
Schizophrenia |
4 |
Treatment |
1 |