Archive for the ‘Newsletters’ Category

IRW 2008-04: Principal Component Analysis (PCA)

April 16, 2008

PCA

Visualizing the two principal components of a data set.

 

The current issue of IR Watch - The Newsletter should be in your inbox during the day.

It is on Principal Component Analysis and covers the followings:

Introduction
What is PCA
A Reaction Equation Approach
Computing PCA with SVD
A Practical Example
Applying SVD to the Covariance Matrix
Improving Results with SPCA
Beyond the Covariance Matrix
Conclusion
References
News, Research, and Events
Terms of Use and Copyright

Back Mapping Document Clusters to Terms

March 11, 2008
Back Mapping

Back Mapping is a simple technique that
elegantly identifies local and global topics.

Here is a snake preview of the current issue of IR Watch - The Newsletter. It should be in subscribers inbox during the day.

IR Watch 2008-03: Back Mapping Documents to Terms

Keywords: back mapping, documents, terms, association clusters, scalar clusters

In this issue:

Introduction
Revisiting Association and Scalar Clusters
On Back Mapping
Back Mapping Example
Extracting Association and Scalar Clusters
Examining Neighborhood-Induced Similarity
Back Mapping Document Clusters to Terms
Identifying Local and Global Topics
Conclusion
References
News, Research, and Events
Terms of Use and Copyright

IRW-2008-2 Snake Preview

February 18, 2008

Scalar Clusters

Masking Effects in Similarity Matrices:
Topics with strong similarities can mask other topics,
making these invisible to the clustering process.

Here is the snake preview of the current issue of IR Watch - The Newsletter. Due to academic duties, it is running late. It should be in subscribers’s inboxes during the day. The topic is Scalar Clusters and Back Mapping and is based on lecture material covered in the Web Mining course. The following topics are covered:

Introduction
On Association and Scalar Clusters
The Neighborhood Similarity Matrix, Mn
Extracting Association and Scalar Clusters
Examining Neighborhood-Induced Similarity
Back Mapping Term Clusters to Documents
Masking Effects in Similarity Matrices
Conclusion
References
News, Research, and Events
Terms of Use and Copyright

Not a subscribers? What are you waiting for?

Until Next Year

December 28, 2007

Well, this was an incredible year.

I participated of several international conferences, changed ISPs, went back to teaching at the graduate school, and to conduct academic research; I also gained new friends from all over the world.

Next year I have several conferences and activities to take care of, teach next Spring a new graduate course, titled Search Engines Architecture, and take care of few consulting projects.

The IRW Newsletter should arrive to subscribers inbox early: Today.

I’m taking few days off. Until Next Year.

Cheers,

Dr. E. Garcia

Preview of IRW-2008-1: Association Clusters

December 27, 2007

Here is a preview of IRW-2008-1.

In this issue of the IRW newsletter we discuss association clusters in the context of keyword discovery.

Introduction
What are Clusters?
A “Reaction” Equation Approach
The Term-Document Matrix
The Document-Term Matrix
The Term-Term Co-Weight Matrix
What is a Similarity Matrix?
A Nomenclature Convention
Computing the Similarity Matrix
Identifying Association Clusters
Association Clusters: Some Applications
News, Research, and Events
Terms of Use and Copyright

The material covered is an adaptation of one of my Web Mining Course lectures.

IRW 2007-12: Testing Your Web Mining Knowledge

December 7, 2007

The 2007-12 issue of IR Watch - The Newsletter has been sent.

In this Issue:

Introduction

General Concepts

Relevance Perception

Document Indexing

Association and Scalar Clusters

Document Classification

News, Research, and Events

Terms of Use and Copyright

This issue of IRW is different from previous issues, which consisted of full articles on specific IR topics. This time I want to provide a  Practice Partial Test I use in the graduate course I currently teach at pupr.edu, Web Mining: A First Course in Web Mining, Search Engines, and Business Intelligence.

I intentionally left out the correct answers.

IRW-2007-11: The K-Means Algorithm

November 14, 2007

By now, current subscribers should have received the November issue of IR Watch - The Newsletter. The following topics are covered:

Introduction

The K-Means Algorithm

Applications

Clustering by Features

K-Means Example

The Sum of Squared Error (SSE)

Selecting a Stopping Condition

Clustering by Cosine Similarities

The Initial Centroid Problem

Bisecting K-Means

Limitations of K-Means

From K-Means to K-Medoids

K-Means and Scaling

From Spherical K-Means to Fractal Clusters

Conclusion

News, Research, and Events

Terms of Use and Copyright

If you are working in this area, this issue will help you a lot. I’m currently advising a grad student working on a K-Means graduate project at the MS level and he found the issue really useful. If you are new to the topic, the material discussed also serves as a handy tutorial.

Constrained Co-Occurrence Searches

September 4, 2007

In the current issue of IRW, “Constrained Co-Occurrence Searches”, we described cc searching in its two variants: proximity searching and adjacency searching. The difference between these two way of searching was explained and illustrated with few examples.

A case was made against the indiscriminate use of the NEAR, proximity, and adjacency searching expressions. A 2005 cc searching algorithm proposed by a research group from the Office of Naval Research (ONR) was also investigated.

In addition, we compared Google’s tilde operator with cc searching. Contrary to SEO opinions, the former is not an LSI operator, but used to conduct a lookup for synonyms; the later allows users to discover on-topic, in-context terms.

In our tests we have found that performance discovery is improved when cc searches are combined with Google’s commands like allintitle: and allinurl: commands, as in

allintitle: “car*insurance”

car rental insurance
car driver insurance
car accident insurance
car motor insurance
car dealers insurance

allinurl: “car*insurance”

…car-teacher-insurance…
…/car-accident/insurance-…
…/car-breakdown-insurance…
…Motor-Car-Import-Insurance…
…car-home-insurance.net…

To expand the text window, add a sequence of asterisks like this:

“car***insurance”

car insurance and home insurance
car rental loss damage insurance
car home and business insurance
car insurance young driver insurance
car life and commercial insurance

This allows users to retrieve documents wherein search terms are separated by at least three terms. To limit the search window to exactly three terms, the ONR algorithm has been suggested. The IRW issue discusses some advantages and limitations of this algorihtm.

Possible applications include SERPs snippet optimization, keyphrase discovery, contextual targeting of terms, and advanced EF-Ratio calculations, amongst other applications. It is clear that Web Mining of answer sets is possible. On-topic analysis is here to stay.

Subscribe to IRW and stay ahead of the curve. Learn about research that normally does not reach mainstream.

Snake Preview of IRW-2007-9

August 31, 2007

Constrained Co-Occurrence Searches

“Unlike AND and EXACT searches, constrained co-occurrence searches (cc searches), consists in searching within a text window wherein search terms are either unordered, as in proximity searching, or ordered, as in adjacency searching. Thus, cc searching is a contextual way of searching within similar neighboring terms.

Here is a snake preview of the September issue of IR Watch - The Newsletter.

IRW-2007-9: Constrained Co-Occurrence Searches

In this issue:

Introduction
AND Searches
EXACT Searches
CC Searches Defined
Proximity Searches
Automated Search Modes
Adjacency Searches
CC Searches at ONR
Differential CC Searches
Testing the ONR Algorithm
CC Searches and Keyword Research
CC Searches and Search Commands
CC Searches vs. Google’s Tilde Operator
Conclusion
References
IR Thoughts - News, Research, and Events
Terms of Use and Copyright

If you are a subscriber this issue should be in your inbox during the day.

Random Notes

August 30, 2007

I’m putting the final touches to IR Watch, now in its first year of publication. I started the project a year ago. Thank you for your support.

Tomorrow I will post a sneak preview of the September issue. This one is about research conducted at the Office of Naval Research in the area of search modes. If you are a keyword researcher you need to read this issue.

I’m also researching a large repository of obscure databases, accessible through ftp. If you are a KDD researcher, you will love to know about these.

IRW 2007-7: Association Rules - Part 1

July 11, 2007

Association Rules

Association Rules based on co-occurrence can be used to address relationships like: Customers buying X tend to buy Y. These can be used to support business-related services such as marketing promotions, inventory, and CRM programs. Learn how by reading the July 2007 issue of IR Watch - The Newsletter.

(more…)

Random Notes

July 10, 2007

I’m putting the final touches to this month issue of IRW, which is running late –reasons all subscribers know by now. It should be out tomorrow.

Amazing how many are still perpetuating so many misconceptions about “LSI tools”. Here is another example, forwarded to me by Melissa Fach, one of several SEOs that are discovering how many “LSI-based” SEO lies are out there thanks to the usual suspects:

http://courtneytuttle.com/2007/07/05/taking-seo-to-the-next-level-lsi/

(more…)

Snake Preview of IR Watch

July 6, 2007

As mentioned the July issue of IR Watch is running late due to the backend changes we made last week to our main site (http://www.miislita.com). If you are a subscriber, IRW should arrive to your inbox in few days. This issue is dedicated to Market Basket Analysis and Keyword Research. Some portions are adaptations from Tan, Steinbach, and Kumar book “Introduction to Data Mining”.

(more…)

IR Watch - The Newsletter

May 8, 2007

The goal of IR Watch - The Newsletter is to disseminate recent advances, research, and news from the information retrieval world. The current issue (IRW-2007-5) is a summary of my presentation at the OJOBuscador Congress 2 (March 8, 9 - Madrid, Spain),

Demystifying LSI for SEOs.

(more…)