Week 2 Agenda:

1. The User-Machine Relevance Perception Gap (PPT presentation)
2. Introduction to Document Indexing (PPT presentation)
3. Linearization: markup removal
4. Tokenization: punctuation removal
5. Filtration: stopword removal
6. Stemming: suffix/prefix removal
7. Tools to approximate document linearization
8. Demonstration of Minerazzi software (early demo)
9. Take-Home Work 1: Document Gap Analysis

Required Reading Material

IR Watch Newsletter; 2007-6: The User-Machine Relevance Perception Gap – This is a free newsletter back issue, available only for students taking the course.
http://www.useit.com/alertbox/reading_pattern.html
http://psychology.wichita.edu/surl/usabilitynews/91/eyegaze.html
http://www.miislita.com/fractals/keyword-density-optimization.html
https://irthoughts.wordpress.com/2007/05/09/keyword-density-the-devils-advocate/
https://irthoughts.wordpress.com/2007/05/07/keyword-density-kd-revisiting-an-seo-myth/
https://www.google.com/adsense/support/bin/answer.py?answer=17954
http://www.miislita.com/information-retrieval-tutorial/indexing.html
http://www.dcs.qmul.ac.uk/~mounia/CV/Papers/ker_ruthven_lalmas.pdf

Advertisements