Search Engines Architecture Week 3

Week 3 Agenda

Lecture Session

Document Indexing
Web Crawling Techniques

Lab Session

For this lab, students should have already signed to download Terrier from http://ir.dcs.gla.ac.uk/terrier. We can use the Desktop API as is, but for development we need JAVA in the local machine.

Lab instructions for using the API will be provided in class. Please read in advanced Terrier documentation. Bring with you a directory (folder) full of documents from the pupr.edu site or your favorite site to play with. This will be analyzed during the lab.

This lab report is due next week.

6 Comments

  1. Saludos Profesor,

    Los labs tenemos que entregarlos también en formato digital pero, ¿en un archivo comprimido para pasarlo de un flash drive a otro?

    Gracias.

  2. Hello Professor,

    I’ve been playing around with Terrier, searching the contents of the collection and learned how to index a specific collection. And while the desktop_terrier.bat searches are returning the correct results of my searches, I haven’t been able to do the same with the interactive_terrier.bat. It keeps returning “No results”.

    I renamed the /etc/terrier.properties file and modified it to suit my document paths. However, with or without modifying the file, it does not return results.

    I have searched the Terrier wiki and forums, and even Google, but I haven’t found a solution. Can you shed any light on this issue?

    Thanks,
    Gina

  3. Hi, Gina:

    Try this:

    1. Index some files with Terrier Desktop and then do a search.
    2. Keeping this one open, try using the interactive_terrier.bat for the same search.

  4. Hi, Gina:

    I just ran a fresh search for inverted file without problem. This is what I did. I’m using the Windows version of Terrier from a USB removable drive. I tested on Vista and on XP.

    1. Ran Terrier, indexing its own documentation.
    2. Searched for inverted file as a query.
    3. Double clicked on bin/interactive_terrier.bat file
    4. Searched for inverted file as a query.

    After the standard headers, I got.

    Set TERRIER_HOME to be J:\terr
    WARNING: The file terrier.prop
    rrier\etc\terrier.properties
    Assuming the value of terrier
    INFO – time to intialise index
    Please enter your query: inverted file

    Displaying 1-82 result
    0 326 112 2.908176030920721
    1 753 471 2.8691891125133053
    2 426 212 2.776936903771998
    3 759 477 2.747474445016238
    4 424 210 2.691801832968892
    5 745 463 2.478234538724751
    6 741 459 2.4410476260123852
    7 734 452 2.4026089191907287
    8 1000 665 2.351714630117981
    9 427 213 2.3437764460259336
    10 402 188 2.3294244235277524
    11 975 640 2.2693842354980465
    12 425 211 2.2625182087305125
    13 422 208 2.2102952746378635
    14 301 87 2.067004898225863
    15 429 215 2.06550056347662
    16 548 279 2.024848127420526
    17 245 78 2.024848127420526
    18 197 75 2.007975485950941
    19 76 40 1.9954047945790332
    20 404 190 1.979005166160501
    21 703 421 1.9562281732712443
    22 339 125 1.9536831254119893
    23 434 220 1.9235037768768704
    24 756 474 1.7579234589850747
    25 28 10 1.7505691014173455
    26 439 225 1.7188073021593937
    27 406 192 1.6978730490041303
    28 436 222 1.6698926255833755
    29 747 465 1.6273903541804449
    30 338 124 1.5987128520302278
    31 22 5 1.5618933140413125
    32 19 2 1.4967606279394443
    33 751 469 1.480498642467794
    34 124 63 1.4586748548300594
    35 437 223 1.4486166892075465
    36 749 467 1.424296235607222
    37 468 254 1.3813768603244259
    38 752 470 1.3636373922215335
    39 304 90 1.2695069276928859
    40 707 425 1.2595481602180154
    41 968 633 1.2427295999967598
    42 340 126 1.238697939749856
    43 758 476 1.2361806398396333
    44 718 436 1.2171117485149614
    45 112 51 1.2044548267644286
    46 303 89 1.191849419444905
    47 1013 678 1.1900218043921074
    48 716 434 1.175694715373152
    49 305 91 1.1144461445417837
    50 373 159 1.071949456510424
    51 350 136 1.0442178750271376
    52 120 59 1.032249223156422
    53 39 21 1.0289066140518082
    54 754 472 1.021245562355564
    55 431 217 1.0112196888981333
    56 40 22 0.9783128316736401
    57 33 15 0.952665942151352
    58 130 69 0.9323554684113402
    59 401 187 0.9298191911149045
    60 606 324 0.9031691289936831
    61 662 380 0.8991516469240083
    62 736 454 0.8835616760231381
    63 323 109 0.882295943944168
    64 126 65 0.8768326329975594
    65 403 189 0.8603341765453334
    66 667 385 0.853341277999745
    67 742 460 0.8361417417726925
    68 668 386 0.8064130750674726
    69 412 198 0.7467488439890779
    70 123 62 0.745745069118931
    71 1007 672 0.6855982352785376
    72 121 60 0.6414169643360372
    73 115 54 0.612631867415782
    74 717 435 0.5569985846392175
    75 117 56 0.5425438654855919
    76 128 67 0.5311998902670805
    77 118 57 0.4688456765488282
    78 127 66 0.44246701888818213
    79 113 52 0.42745882075231073
    80 125 64 0.34905485374098644
    81 75 39 0.20774592741474804
    Please enter your query:

Leave a reply to E. Garcia Cancel reply