Archive for the ‘Graduate Courses’ Category

AIRWeb Course Announcement

April 2, 2009

During the Fall of 2009, I will be teaching 

 Adversarial Information Retrieval on the Web:  A Graduate Course on Web Spam and Internet Vulnerabilities

This a new one-full semester graduate course to be offered at Polytechnic University Puerto Rico. It is based on the material presented at the annual AIRWeb Workshops. KDDM graduate students are encouraged to enroll. An early announcement and preliminary syllabus is available at

http://www.miislita.com/courses/airweb-web-spam-syllabus.pdf

BTW, In November 5 of 2008 PUPR became the First Academic Institution in the Caribbean to be Certified by the Committee on National Security Systems (CNSS). Additional information is available at http://www.pupr.edu/ias.html

Their goal is to become a Center of Academic Excellence in Information Assurance Education (CAE/IAE). These are great news. Nationwide, how many universities you know that are in such an exclusive ”club”?

Sneak Preview of IRW: Graduate Research

August 1, 2008

The current issue of IRW, Graduate Students Research, is out. It consists of short abstracts of research conducted by graduate students.

In this issue:

Introduction
Genetic Algorithms, K-Means, and Fuzzy C-Means
Word Association Patterns
U-Site Search Engine Interface
Enhancement of a U-Site Search Engine Interface
News, Research, and Events
Terms of Use and Copyright

The next issue will go back to its how-to mode.

IR Quiz

June 18, 2008

Here is a question I included during the final examination of the Search Engines Architecture course. I am modifying the question. It might serve as a little quiz for non IR readers:

A collection consists of 500 documents. Some documents mention k1 and/or k2 keywords. If 100 mention k1, 200 mention k2, 70 mention k1 and k2, and 25 mention the k1 k2 terms sequence. Calculate the number of results for the following queries first, assuming terms independence and second assuming terms dependence. If the calculation is not possible from the provided data, write NC, ‘Not Computable’.

1. k1 NOT k2

2. k2 NOT k1

3. k1 OR k2 (unconditional OR)

4. k1 OR k2 (conditional OR)

5. NOT k1

6. NOT k2

7. NOT (k1 AND k2)

8. k1 AND k2 NOT (k1 k2)

9. EF-Ratio of the k1 k2 terms sequence

10. c12-index of the k1 k2 terms sequence

11. c12-index of k1 AND k2

12. IDF of k1

13. IDF of k2

14. IDF of k1 AND k2

15. IDF of k1 k2 terms sequence

Total Possible Scores: 15 points for terms independence and 15 points for terms dependence correct results.

Grading Yourself: A (100 – 90), B (89 – 80), C (79 – 70), D (69 -60), F(59 – 0)

Correct answers will be given during the week.

 

Search Engines Architecture Week 10

May 16, 2008

Week 10 Agenda

Lecture Session

Other Inverted Index Architectures
Divide-and-Conquer Strategies for Fast Indexing and Searching

Lab Session

Lectures and Lab Review

Final Examination Notes

Next week we have the final examination. This is an open book exam, with theory and practice sections.

To answer the test you need:

#2 pencil.
Calculator.
Working version of Terrier.
Tools developed during the course: parser, crawler, url and query normalizers, stemmer, etc.
Laptop (or a PC will be supplied to you).

Search Engines Architecture Week 6

April 18, 2008

Week 6 Agenda

Lecture Session

Review on Regular Expressions
Lovins, Krovetz, and Porter Stemmers
Xu & Croft Stemmer (Word Co-occurrence-based Stemmers)
The Parallel Stemmer (LSI/Vector Space-based Stemmer)

Lab Session

Building a Stemmer

Comments: Lab 3 is due next week, but if you prefer to turning in earlier is ok. Lab 4 and this lab (Lab 5) are similar in nature. We can negotiate tomorrow in class their deadlines. A reminder that all labs are due in hard and electronic format to get full credit.

Search Engines for Penetration Testing Course

March 7, 2008

More on the topic of college education and search engines:

In the Caribbean we don’t have seasons, just early spring breaks –compared to back in the states. After a 2-week spring break vacation, the new trimester just started. I am teaching tomorrow the Search Engines Architecture Course. This is going to be fun.

BTW. I am putting together a new graduate course for the next fall semester: Search Engines for Penetration Testing. I look forward to any suggestion from colleagues and from former  or prospective students. This is going to be a nirvana for ethical hackers/spammers.

SEOs and University Education

February 29, 2008

Posters at SEOmoz are debating why the Internet is not taught at schools.

One poster claims: “I think all Universities are quite a ways off from this.” Others simply think this will never happen or that if it does, it will not be worth it.

These opinions are understandable, especially when universities have offered courses with “ecommerce”, “web marketing”, “ebusiness” and similar terms in their course titles when most of these are soooo outdated. Many are limited to explaining what is a cookie, bayesian and game theorems, and few other topics that are not really that useful in the real world.

Here is a first hand story. Back in 2002 I was hired by Graduate School of Business of University of Turabo in PR to teach the graduate course ECommerce Technology. It was the first time the course was offered as a core course for students pursuing a master degree. The problem: the syllabus and textbooks were sooo outdated, with case studies of companies that no longer existed. I was forced to redesign the entire course and material.  

Here is another first hand story. Many students that took data mining at Polytechnic University of Puerto Rico (PUPR) before I was hired by the university were complaining that they did not learn anything useful because lectures emphasized theory and no practice. This is something I tried to avoid when back in October I started to teach the Web Mining graduate course. It is the same approach I use for my other courses.

As for studying the Internet as SEOmoz posters argue, it is not possible to study “The Internet”. When they say “Internet” probably they refer to studying search marketing, SEO, or Web Analytics. It looks like an opportunity for other marketers to make some money out of their peers’s ignorance.

I know there are some seos already trying to squeeze money from their peers by offering college-type courses dictated by “experts”. Don’t be gamed by these folks. Their “colleges” and “institutes” are not certified by any higher education body, like The Middle States Association, or by research funding organizations like NSF. These mostly look like scams and their diplomas are not worth even the ink these hold.

As for the above claim of teaching SEO in colleges, there is a list of traditional schools already teaching updated web marketing, design, usability, and even accessibility courses. In fact, more and more grad schools are developing Web Mining and Web Marketing courses.

At PUPR I’m in the curriculum development arena, developing and teaching the following hands-on courses, all at the graduate level:

Search Engines Architecture (Spring – classes start next week; lectures and lab)
Web Mining (Winter – semester just ended; lectures only) *
Search Engines for Penetration Testing and Intelligence (Fall – next fall; lectures and lab) **

* This was a course on Web Mining, Business Intelligence, and Search Engines. Agenda and syllabus is available online.

**Just asked by the head of EE&EC and CS Dept to teach this one.

These ARE NOT paperless, online courses. The class meets in the computer lab building. We have plenty of computers and software to play with. I offer all lectures using powerpoint and smartboards. We study which Web business models work and which one don’t. We check case studies from the Web. We dissect SEO myths. We teach why and how search engine algorithms and web analytics work, etc, etc.I have grad students conducting projects or theses supported by grants from gov agencies like DoD, etc. Some of these projects interface with SEO, Web Analytics, Business Intelligence, and Homeland Security.

In addition, we have an upcoming conference on these topics (October). I’m also pushing for a 2-year certificate on Web Marketing & Analytics with a local college.

And how about AIRWeb wherein , as scholars and researchers, we dissect and test search engine spam strategies and find new ways to neutralize, minimize, or “kill” these techniques–many promoted by some among you?

SEOS: Definitely, we are not oblivious to Web marketing and your “world”.

Search Engines Architecture Graduate Course

February 27, 2008

Search Engines Architecture – Lecturer: Dr. Edel Garcia
Code: CECS 7804/31 Special Topics in C&E
Time: Saturday 8:00 A.M. – 12:00 N, Spring 2008
Room: Software Testing Lab, L-210
Academic Calendar: http://www.pupr.edu/academiccalendar/ac-wi05.pdf
Final Examination Date: May 24, 2008

Description: This is a hands-on, one-full semester course on search engines architecture and their algorithms. Each class consists of lecture and lab sessions. Students are expected to build and test their own search engines and related components on a server dedicated for this purpose.

Course Communication: All communications, upgrades/changes to this syllabus to fulfill class needs, answer to questions, clarifications, etc will be made available through this blog. Posts will be indexed in the Search Engine Architecture Course category. Thus, students must read this category on a regular basis. To access the category, just click on the link listed in the Categories section at the right of this blog home page.

Classroom Policies: Taping is not allowed. Lecture Notes are not published online. Students are required to take notes in the traditional way. All lecture material is copyrighted.

Important Note: Students registered in this course automatically receive an invitation to present their projects developed during this course at the Search Engines and Information Security Conference to be held at Polytechnic University, San Juan Campus during October 3 and 4, 2008. Contact Dr. Garcia at admin@miislita.com for additional details or questions regarding the conference.

Target: Students in Business, Engineering, and Computer Sciences and from other disciplines are encouraged to register for this special course.

Requirements: Permission from advisor or department and knowledge of matrix algebra.

Grading: Weekly Lab Reports, Project Presentations, and a Final Exam. The following scoring system will be used:

Course Grade = Ave*(1 – w) + Fin*w

where

Ave = average of all lab reports and group projects. The lowest 2 partial grades are eliminated.

Fin = final exam score
w = an adjustable weight

The letter grade scale is as follows:

A = 100 – 90; B = 89 – 80; C = 79 – 65; D = 64 – 55; F = 54 – 0

Topics: Although not necessarily in this order, some of the topics to be covered, include, but are not limited to the followings:

Linear Algebra Fast Track Tutorial: Brief tutorial on matrix operations with emphasis on vector theory and Singular Value Decomposition

Parser Building: Use of regular expressions to build, test, and use a parser.

Crawler Building: Use of AJAX to build a client-side crawler and a dedicated server-side crawler.

Look-Up Directories: Implementing a look-up directory and pseudo site search tool.

Search Interfaces: Developing and testing search interfaces, their search modes, and advanced search features.

Index and Database Building: Data fragmentation and storing.

Sanitizing and Ranking Answer Sets: Filtering, De-duplicating, and ranking answer set results.

Textbook: There is no official textbook. Open source components will be used or developed by the students. However, the following reference books are recommended for research. Additional references and extended syllabus will be provided in class.

Modern Information Retrieval (Baeza-Yates and Ribeiro-Neto; Addison Wesley).
Information Retrieval – Algorithms and Heuristics (Grossman and Frieder; Springer).

Special Matrices You Should Know About

August 14, 2007

A reader asked me about matrices, so I referred him to my series of Tutorials on Matrices and IR. He then asked me about some special kind of matrices. I answered his questions with some examples.  He then replied with some analogies.

(more…)

The Mathematics of Mind

June 7, 2007

IPAM is offering a graduate summer school program called: “Probabilistic Models of Cognition: The Mathematics of Mind” during July 9 – 27, 2007. More information is available at http://ipam.ucla.edu/programs/gss2007/

According to that link and quote:

(more…)