If you are taking the Search Engines Architecture grad course, by now you should have learned what are the main components of a search engine and how to build a web crawler and a parser. You should know how to build an inverted index, how to use this to dynamically generate query-specific term-document matrices, and how to populate these with a variety of scoring models other than plain tf-IDF.

As the course progresses you will learn how to speed up document ranking through caching/updating  and divide-and-conquer multitiered strategies.

By now you should also have realized why most of the stuff published by SEOs about how search engines work are either misconception, myths, or just untrue folklore. Eg., While some have an incorrect idea on how vector space models are used, the bold idea that search engines do not use vector models to rank documents is simply non sense.

To illustrate visit the following two links:

http://www.searchenginepeople.com/blog/how-search-really-works-relevance-2-vector-space.html

http://www.atg.wa.gov/uploadedFiles/Home/News/Press_Releases/2004/Internet%20AdvancementComplaint.doc

The first one is about an SEO discussing “how search engines work” and use the Vector Space Model. The second is about the State of Washington suing a marketing company for misselling “search engine optimization” services.

How many factually incorrect statements/assumptions can you spot from the author of the first article and its commenters?

How many impossible facts and untrue statements can you spot in the second by the defendants?

If you have problems visiting the second link, I have a pdf copy for your perusal.

About these ads