Search Engines Architecture Week 2

Week 2 Agenda

Lecture Session

Visualizing Matrix Operations
SVD and PCA Review
If we have time, I will start with:
Overview of Document Indexing and Ranking Algorithms
First-Breadth and Deep-First Web Crawlers
The Terrier Desktop Searches Platform (Java)

Lab Session

Complete Lab 1. Please add the following instructions to the lab.

In Part 3, section 3.1.3, add the following task:

Compute the sum of the eigenvalues of ATA and the trace of this matrix. Do the same for AkTAk. Compare results and draw some conclusions. What important property is confirmed?

In Part 3, section 3.1.4, add the following task:

Finally, column-normalize VkT and construct a similarity matrix from it. Extract scalar clusters from it. Compare with the clusters extracted from AkTAk. Explain your observations.

In Part 4, section 4.1.1, add the following task:

Using EXCEL, reproduce the PCA example given by Smith in reference 4. Show all calculations.

Teaser: Consider the following lecture material list. Which trick is being used to reduce link juice (importance)? How would you add link juice?

Lecture Material

1. Using latent semantic analysis to improve access to textual information; Dumais, S. T., Furnas, G. W., Landauer, T. K., Deerwester, S., & Harshman, R. (1988). Proceedings of the Conference on Human Factors in Computing Systems, CHI. 281-286,
PDF

2. Indexing by Latent Semantic Analysis; Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. (1990).
PDF

3. Association and Scalar Clusters Tutorial; Garcia, E. (2008).
PDF

4. A tutorial on Principal Components Analysis; Smith Lindsay (2002).
PDF

5. A tutorial on Principal Component Analysis; Jon Shlens (2003).
PDF

6. Do Your Worst to Make the Best: Paradoxical Effects in PageRank Incremental Computations
Boldi, P., S. Massimo, V. Sebastiano Vigna (2007)
PDF

11 Responses to “Search Engines Architecture Week 2”

  1. panzernieves Says:

    Prof. :

    Verifique el calendario academico y segun dice en el, este sabado no hay clases, solo para notificarlo a usted y al resto del grupo. Cheers!!!

    G. Nieves

  2. E. Garcia Says:

    Hi, Nieves:

    Thanks.

    Indeed we don’t have classes that week, according to the academic calendar over at http://www.pupr.edu/academiccalendar/ac-wi05.pdf

  3. E. Garcia Says:

    Dear Students:

    In Lab 1 Part 4, you need to provide a modified SciLab code of PCA. You also need to reproduce the example given in Lindsay Smith’s tutorial. Use EXCEL and any SVD calculator.

    Be aware that the accepted formula for covariance uses n - 1 in the denominator, not just n. If your version of EXCEL uses n, you need to correct results by multiplying times n/(n-1).

    I am putting together a tutorial on PCA to help you.

    Cheers

  4. ageigel Says:

    The following is a list of things I have found so far in the Scilab exercise, which can be remedied:

    * There are extra ’ symbols inserted in the functions that generate syntax errors

    * M-file style comments % convention give errors, I had to substitute with //

    * All functions must end with endfunction. I had to add these to the code.

    * ASCII to number function in Scilab does not exist. The only alternative I found is scanf which in c/c++ can cause lots of problems with overflows.

    * Scilex gui is unstable and linking functions to TCL libraries gives errors at compilation time.

    These are other errors that I have gotten so far, and I have not been able to solve. Does anybody have a recommendation?

    * On Shlens code: the repmat substitute, ones command found in http://www.scilab.org/product/dic-mat-sci/M2SCI.htm, produces errors in matrix multiplication consistency.

    * In Lindsey’s program the line: finaleigs = eigenvectors(:,1:dimensions); is giving errors. I looked up the function eigenvectors in scilab, and it does not appear in the site, nor in her code.

  5. luisjaniel Says:

    Saludos Dr. García,

    Le escribo para preguntar si el todo el Lab hay que pasarlo usando el template que usted envió incluyendo los que expecifican EXCEL.

    Gracias.

  6. E. Garcia Says:

    Hi, Luis:

    It is easier if you can provide EXCEL files and reference these in the report as doc-1.xls, doc-2.xls, etc. Place all files, docs, etc in a single folder and zip it. Provide also hard copy of the report.

  7. panzernieves Says:

    Hi,

    I did Exercise 4 using the provided codes. I copy and paste them in Matlab and had to do minor corrections to the code in order to use them. I didnt use Scilab for this problem but I will try them tomorrow on Scilab and do corrections if necessary and tell you whatever I find out.

    Cheers!!!

    G. Nieves

  8. panzernieves Says:

    Hi,

    Still having problems with:

    1) [PC, V] = eig(covariance) usins scilab, well I found a possible solution for this:

    “The Spectrum of a Matrix is the set of all eigenvectors of a matrix”

    That is why I suggest you use

    [PC, V] = spec(A);

    instead. If youre using scilab, this function yields the same results (mathematically speaking, that is) than [PC, V] = eig(covariance)!!!

    For more information just type help spec in the scilab environment.

    2) If you need to express your results in with two decimals, just write format bank in matlab or format 5 in scilab (thanks A. Paris for this one).

    3) All functions need to end with endfunction so be careful with the code, Im still working on this.

    More info? Just keep on blogging (is that even a word/verb?).

    Cheers!!!

    G.Nieves

  9. panzernieves Says:

    Correction and more suggestions:

    Spectrum := set of all eigenvalues of a given matrix.

    Matlab:
    [PC, V] = eig(covariance);

    Scilab:
    [PC,V] = spec(covariance);

    In order to use an user-built function you need to load the function first, once the function is written, type in Scilab environment:

    getf(”C:\…whatever path…\cov.sce”);

    In this case I was trying to load my function cov.

    Once loaded simply type:

    >>[c] = cov(A);

    PS: If there is an error in the code, fix it and save and load the function again!!!

    Cheer!!!

    G. Nieves

  10. panzernieves Says:

    Hi,

    In order to test the pca1 program given in the papers, I advice you to use the example given in PCA and SPCA Tutorial by Dr. Garcia.

    In order to obtain the same results than in the tutorial type:

    [signals, PC, V] = pca1(data);

    where

    1) data is the transpose of X (first matrix in tutorial, where columns are ordered Weight, Height, Age)
    2) PC is equivalent to V in the tutorial,
    3) V is the diagonal elements of matrix S in the tutorial.

    Not sure what signals really mean but sure it aint Step 4’s YV!!!
    Can anybody share any light on this matter???

    Cheers!!!

    G. Nieves

  11. E. Garcia Says:

    Hi, Gerardo;

    Double check the naming conventions. Schlen and Smith also use different naming conventions. We can discuss it tomorrow in class.

    Cheers

Leave a Reply

You must be logged in to post a comment.