Last year, a student taking my Search Engine Architecture course asked me if client-side search utilities -like javascript-based tools- could be used to grasp how search engines process queries.

My answer is given below:

Well, it depends how these are scripted.

With few exceptions, most of those scripts teach you how to manipulate objects, constructors, etc.

The vast majority of these do not teach how to build the architecture of a search engine like its inverted index or how, when searched, the inverted index addresses the index and retrieves records. Most so-called “javascript search engines” do not incorporate the creation of pseudo-documents, procedures for normalizing queries/urls, attenuating term frequencies, a valid ranking algorithm, crawling agents, dispatcher, query server, etc.

Many so-called “search engines” are just oversized site-search tools with a poor sorting subroutine  that lacks of relevance judgements or valid ranking criteria.

Thus, it seems like a nice student term paper project or challenge competition: build a 100% client side search tool mimicking as many as possible of the architectural components of a real search engine, with nothing more than JavaScript and the IE browser. Here 100% means no ‘frankensteins’ like mixing JavaScript with other programming languages, additional platforms, or plug-ins.

Advertisements