SVD has been applied to different scenarios like IR, Economy, Computational Chemistry, BioComputation, and other scenarios. In all these cases, one must pay attention to how one populates the initial matrix to be “SVDied”.

The folks at igvita.com have a great discussion on an SVD Recommendation System in Ruby, complete with a simple example and nice code lines. I stopped by to chime in some thoughts:

Quote starts here.

“The poster that goes by the name of Genghis has raised two good questions:

1. how to discriminate initial vector constituents.
2. why cosine angles and not distances are used.

I hope the following helps:

Addressing this from the IR side, early LSI papers describe SVD wherein initial term-doc matrix entries (aij) were defined as local weights (Lij)

aij = Lij = fij

where f is the frequency (occurrence) of term i in doc j.

This assumes that aij values are independent of one another and not limited by an upper bound.

Authors of same early LSI papers soon recognized that performance is improved by incorporating global (Gi) and normalization (Nj) weights:

aij = Lij Gi Nj

wherein Gi is the weight of term i across all docs. Gi can be defined in terms of inverse frequencies or entropies. Nj is a normalization weight for doc j, usually set to 1.

Lij, on the other hand can be constrained to a 0 to 1 interval using augmented logarithms or normalized augmented scales. Using raw counts for Lij implies that local weights are defined using a scale without an upper bound.

Lij, Gi, and Nj have been described using dozen of schemes. Since Lij, Gi, and Nj can be defined to run within the 0 to 1 scale, aij adopt values between 0 and 1.

There are several advantages of defining global weights as entropies than as mere inverse frequencies, but that is out of the scope of this post. Many -including Berry, Dumais, and others- have used entropy for global weights.

As given in the example above, initial entries were given without considering global and normalization weights. These are also given without an upper bound.

The problem with using local weights is that the relative importance of a given object (or feature) against all other objects of the collection are taken out of the picture.

At first, not incorporating global weights difficults discerning, eg. between something like Obj(5,5,0,0,0,4), Obj(1,1,0,0,0,1), Obj(5,5,0,0,0,5) or even something like Obj(x, x, 0, 0, x) -where x can be any number that sweets your heart-, since

(a) we are not considering the relative importance of vector constituents (entries defining a vector) across the entire collection at hand.
(b) vector constituents don’t have an upper bound.

Regarding Genghis second question: why cosine is used instead of distances, two good reasons (there are many) are:

(a) a cosine angle when used as a similarity measure actually reduces to a correlation coefficient.

(b) If one still want to map similarity to distances this can be done. Once similarities are known these can be converted into distances. However, while a distance can be converted into a similarity measure, the reverse is not that obvious because of the triangular inequality which must be satisfied by a distance metric.

BTW, when we explain SVD to students we often use aij = Lij, just to illustrate the basics of SVD and simplify classroom presentation. Then students are introduced to advanced term weight schemes for populating a term-doc matrix to be “SVDied”.

At least in IR, the framework

aij = Lij Gi Nj

is the one used with real SVD applications.

I hope this helps to clear things up.”

End of the Quote.

Note I used the conditional statement “At least in IR…” since there might be specific cases, like with SVD in computational chemistry, wherein one is forced to settle for a simpler representation of weights.

At least in computational chemistry one uses vector space models, SVD, similarity measures, and other concepts from ancilliary fields, but accommodating these to the underlying chemical significance under consideration. This means that a chemist must look at the chemical principles involved and not just at the numerical analysis. In this sense, all these ancilliary techniques from IR acquire some sort of useful meaning and “life”.

Advertisements