As PUPR students know by now, the AIRWeb and Internet Engineering courses have been consolidated into a single course called Internet Engineering I (IE-I), which is on Tuesday’s.
This was a decision made strictly by the administration. 12 graduate students are enrolled –a big number for a grad course. We are now in the fourth week of IE-I and I can tell that is a lot of fun.
This coming Winter semester I’m scheduled to teach a new grad course called Advanced Search Engine Architecture (ASEA). Both, IE-I and ASEA are hands-on. This means students need to get their hands and feet wet, not just learning the theory.
What we are trying to accomplish in IE-I is to understand how hackers and spammers use Internet architectures at the level of TCP/IP and Search Engines to game the system. I’ll open a special blog category for it during the week.
First lecture (Lecture 1) was briefly summarized in the August 2009 issue of IR Watch. BTW. Tonight’s lecture (Lecture 4) covers the following:
IP Protocol (MAC and IP Mapping)
ICMP Protocol
ARP Hacking Attacks
ICMP Hacking Attacks
Firewall’s Fragmentation Offset Attacks
Meanwhile, ASEA is an expanded version of the previous Search Engine Architecture (SEA) course I’ve taught before. Students interested in registering, can search this blog for the SEA category and check what we have covered in the past. This will give them an idea of what to expect from the Advanced SEA course. One thing I’m planning to do different is to build an inverted index from scratch using AJAX. The most recent version of Terrier will also be used for testing/benchmarking experimentals.
Last but not least, September Issue of IRW will be a bit delayed.
Hi there,
I have some little questions and if you kindly answer me please, i’ll be thankful.
I’d like to know your idea about a problem. In vectorial model IR, how we can process the queries with OR or NOT logic operators betweens query words. I think that we give “0″ as weight of term which do not have to exists in document. But in this case, it is always possible to return a document that contain this word even we have given 0 as weight of the term in query vector.
I’d like to expand my question, if we have some feedbacks of user like the returned documents do not have to contain such words but it’s not certain.
So how can we give more importance to some words of queries? I know that we do that by weighting method like tf-idf but they are not based on user feedback…
The last question, can you kindly please explain me if the significant words of a document are not frequent in that document but in other ones. (in contrast to tf-idf hypothesis). So they are not weighted as important by weighting methods based on frequent of terms. How can we weight them in a relevant way?
many thanks.
regards
Hi, Hassan:
Thank you for stopping by.
There are at least two ways of addressing OR query weights
(a) Using the Boolean Model
(b) At the level of the inverted index
(a) For this question, a brief tutorial on the Boolean Model is given at http://www.miislita.com To find it, just do a search at the site for the keyword [boolean model].
(b) This question can be addressed at the level of the inverted index by identifying all posting lists containing at least one of the query terms and by not intersecting posting lists. The returned posting lists are used to construct the term-document matrix. Any flavor of scoring weights can be used to fill the cells. Vector analysis is then applied to the corresponding document vectors.
tf-IDF is used when we do not have relevance information feedback from users. If we need relevance information, then we can use any flavor of the Robertson-Sparck Jones Probabilistic Model. In addition, you can tweak parameters using OKAPI BM-25. A tutorial on the RSJ-PM is also available at Mi Islita. To find it, do a search at the site for the keyword [rsj-pm]
I hope this help.