inverted index

The current issue of IRW is out!

This is Part Two of the series on inverted index architectures, a 3-part series organized as follows:

Part One: Inverted Index Types
Part Two: Fast Indexing Techniques
Part Three: Fast Intersecting and Sharding

Tasks related with indexing, searching and processing are also discussed.

The QA section features short code liners in JavaScript aimed at helping readers understand what is tokenization and how is implemented.

Although not described in the newsletter, it is possible to construct these type of components with scripting languages. As a matter of fact, we have built an entire forward index and inverted index written entirely with JavaScript. Once computed, the inverted index can be written to memory. This work for small collections. For large collections, we read/write it to a text file using ActiveX, which is then posting-lists intersected in the usual way. However, for really large collections this is not effective and a database solution is recommended. The point to be made is that constructing a JavaScript-based search engine at the client and with real components, not a mere over-sized look-up “site search tool”, is possible. Since ActiveX is Microsoft’s land, it is not a universal solution. As a quick enterprise solution for short collections, it is ok, I guess.