Search Engines Architecture Week 4
Week 4 Agenda
Lecture Session
Building an Inverted Index
Incorporating Query Modes
Building the Term-Doc Matrix from the Inverted Index
Lab Session
Programming the Parser
Programming the Inverted Index
Week 4 Agenda
Lecture Session
Building an Inverted Index
Incorporating Query Modes
Building the Term-Doc Matrix from the Inverted Index
Lab Session
Programming the Parser
Programming the Inverted Index
You must be logged in to post a comment.
April 11, 2008 at 11:05 am
Hi Prof.:
We’ve been working with the perl spider for a while now and while it seems to work (at least it runs!!!) it doesnt seems to be working right. It never get the right results (it doesn’t get any results, that is). I’d document what I do understand about the code but there not much I can do for now.
G. Nieves
April 11, 2008 at 11:51 am
Since this is a hands-on course, any roadblock is a learning lesson.
Try other alternatives.
1. Google for other perl scripts (there are plenty).
2. Try to use a C code like the old Lycos Scoutget code.
http://robot-club.com/lti/lycos/scoutget.html
3. Try any PHP or Java based spider.
Here is a barebone simple pseudocode. The crawler should do this:
1. get a web page via an http request (it can be via AJAX) and send it to a directory.
2. scrape links from that page.
3. pick a link from the page and do step 1.
Modifications:
In step 2, the program sends links to a list of links and in step 3, picks a link from the list instead from the page.
Once we send enough documents to the directory, we can index this with Terrier.