Search Engines Architecture – Lecturer: Dr. Edel Garcia
Code: CECS 7804/31 Special Topics in C&E
Time: Saturday 8:00 A.M. – 12:00 N, Spring 2008
Room: Software Testing Lab, L-210
Academic Calendar: http://www.pupr.edu/academiccalendar/ac-wi05.pdf
Final Examination Date: May 24, 2008

Description: This is a hands-on, one-full semester course on search engines architecture and their algorithms. Each class consists of lecture and lab sessions. Students are expected to build and test their own search engines and related components on a server dedicated for this purpose.

Course Communication: All communications, upgrades/changes to this syllabus to fulfill class needs, answer to questions, clarifications, etc will be made available through this blog. Posts will be indexed in the Search Engine Architecture Course category. Thus, students must read this category on a regular basis. To access the category, just click on the link listed in the Categories section at the right of this blog home page.

Classroom Policies: Taping is not allowed. Lecture Notes are not published online. Students are required to take notes in the traditional way. All lecture material is copyrighted.

Important Note: Students registered in this course automatically receive an invitation to present their projects developed during this course at the Search Engines and Information Security Conference to be held at Polytechnic University, San Juan Campus during October 3 and 4, 2008. Contact Dr. Garcia at admin@miislita.com for additional details or questions regarding the conference.

Target: Students in Business, Engineering, and Computer Sciences and from other disciplines are encouraged to register for this special course.

Requirements: Permission from advisor or department and knowledge of matrix algebra.

Grading: Weekly Lab Reports, Project Presentations, and a Final Exam. The following scoring system will be used:

Course Grade = Ave*(1 – w) + Fin*w

where

Ave = average of all lab reports and group projects. The lowest 2 partial grades are eliminated.

Fin = final exam score
w = an adjustable weight

The letter grade scale is as follows:

A = 100 – 90; B = 89 – 80; C = 79 – 65; D = 64 – 55; F = 54 – 0

Topics: Although not necessarily in this order, some of the topics to be covered, include, but are not limited to the followings:

Linear Algebra Fast Track Tutorial: Brief tutorial on matrix operations with emphasis on vector theory and Singular Value Decomposition

Parser Building: Use of regular expressions to build, test, and use a parser.

Crawler Building: Use of AJAX to build a client-side crawler and a dedicated server-side crawler.

Look-Up Directories: Implementing a look-up directory and pseudo site search tool.

Search Interfaces: Developing and testing search interfaces, their search modes, and advanced search features.

Index and Database Building: Data fragmentation and storing.

Sanitizing and Ranking Answer Sets: Filtering, De-duplicating, and ranking answer set results.

Textbook: There is no official textbook. Open source components will be used or developed by the students. However, the following reference books are recommended for research. Additional references and extended syllabus will be provided in class.

Modern Information Retrieval (Baeza-Yates and Ribeiro-Neto; Addison Wesley).
Information Retrieval – Algorithms and Heuristics (Grossman and Frieder; Springer).