Our tutorial on standard errors is back! It is now available at
We have edited and updated the tutorial. New material was added.
This is Part 3 of an introductory tutorial series on Term Vector Theory. The classic term frequency-inverse document frequency model or TF-IDF, is discussed.
Its advantages and limitations are discussed.
The tutorial is available at
For more tutorials, visit
PS. Exercises where added to the tutorial and few typos removed.
This is Part 2 of our introductory tutorial series on Term Vector Theory as used in Information Retrieval and Data Mining. The Binary (BNRY) and Term Count (FREQ) models are discussed.
The tutorial is available at
1. Starting with our classic Term Vector Theory series, we are republishing our series of tutorials on Information Retrieval from the early and mid 2000s. See http://www.minerazzi.com/tutorials/
2. A new miner on the Zika Virus is now available online at http://www.minerazzi.com/zika/
3. Additional miners are listed at http://www.minerazzi.com/
Two new legacy tutorials aimed at those data mining information security and written way back, in 2009, are now available at
This tutorial covers maximum transmission unit (MTU), maximum segment size (MSS), PING, NETSTAT, and fragmentation.
This tutorial covers IP fragmentation, data payloads, IP packet and header lengths, maximum transmission unit (MTU), and fragmentation offset (FO).
As reviewer of journal manuscripts and conference papers I normally look to see if the piece before me answers the following questions:
1. WHAT-WHY: What is the scientific problem at hand and why is important?
2. WHO-WHAT-WHY: Who proposed what previous solutions and why are these inadequate or incomplete?
3. WHAT-YOUR-WHY: What is your proposed solution and why is better?
4. HOW-WHAT: How is the solution implemented and what are the benefits or practical applications?
5. PROS-CONS-WHAT: What are the possible pros and cons of your solution and what are the next areas of research?
I forget to mention that I’m attending ICANN this week, so most will be legacy posts –straight from the conference.
The ResearchChannel is a research consortium dedicated to serve as an online channel for the dissemination of cutting edge technologies. If you want to learn the real stuff under the hood of search engines, just do it through the ResearchChannel. Want to learn the difference between LSA(LSI) and LRA (Latent Relational Analysis)?
This is a great topic for a graduate thesis: Traditional IR considers the problem of matching documents to a query as a single information need to be satisfied. However, since a system doesn’t know what is in the mind of users, the query itself can be a multiple information need.
When you think thoroughly this is why Web searching, how users search and reformulate queries on the Web, is different from, for example, IR searching –wherein user resource to query expansion and relevance feedback mechanisms.
Often a distinction between the terms given in the title of this post is not clear in the literature.
Closeness is a generic notion that can be expressed in terms of proximity, similarity or distance.
Authors: Meenakshi Nagarajan, Amit Sheth; LSDIS Lab, Dept. Of Computer Science, University of Georgia, Athens, GA, USA; Marcos Aguilera, Kimberly Keeton, Arif Merchant, Mustafa Uysal, HP Labs, Palo Alto, CA
This new study, presented at WWW2007, Banff, Canada, confirms the importance of co-occurrence, this time in relation with ontologies.
The abstract states: