Today, a reader (name removed to protect confidentiality) asked me:

My name is **** ****. I working as a junior research fellow in a project in India. I red the SVD techniques from the web page http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-3-full-svd.html#right-eigenvectors. I found it is quite satisfactory for me. Now I can understand how SVD works. But I have a query as follows.

query:

As mentioned in this tutorial that we have arrange these eigen-values in descending order. Cold you please tell me if I put these values in ascending order or arbitrary what will be wrong with the SVD.

Looking forward your early kind response.

Thanking you.

With best regards.

*******

My answer follows.

It depends on what you are trying to address.

SVD is used to identify singular values interpreted as dimensions. When used as a dimensionality reduction technique, the largest N singular values are normally retained and thus retaining the smaller singular values is meaningless.  The largest singular values capture most of the information of the original data set and is therefore a noise minimization approach.

If the retention criterion used is reversed (smaller singular values are retained) this implies retaining the more noisy dimensions such that the reconstructed matrix will be a matrix of the hidden (latent) data noise. This is a noise maximization approach.

If the retention criterion is based on a random selection, the resultant reconstructed matrix might be one representing a data structure with randomized noise.

These scenarios depend on the original data under examination. 

In Image Compression, these approaches have been already explored. If the goal is a stability study and not just SVD dimensionality reduction, “the ratio between the highest singular value and the lowest singular value of the Jacobian matrix quantifies the spread of the Jacobian’s singular values, which in practice, reflects the extent of the solution’s instability with respect to small changes in the observation”  (Horesh’s Thesis )

Having said all that, we should not render noise in a data set as something that must be discarded at all cost.

This is intimate linked with the so-called Inverse Problem. Incorporating noise and a priori SVD information can provide the complete information in a linear sense. Qianqian Fang has a beautiful PPT presentation “Look Closer to Inverse Problem” on the subject. If you want to visualize the MATRIX Problem, this presentation is for you.

I’m thinking in putting together a tutorial on the Singular Value Expansion algorithm (SVE), if I ever find the time.

I hope this helps.

Advertisements