Today’s Internet Engineering Part 1 course lecture will be on DNS Intelligence and how we can use DNS records to understand virus and worm attacks as well as remote network topologies. Quite handy these days.
Please check Lecture 8
Today’s Internet Engineering Part 1 course lecture will be on DNS Intelligence and how we can use DNS records to understand virus and worm attacks as well as remote network topologies. Quite handy these days.
Please check Lecture 8
If you are enrolled in the IE-Part 1 course, here is some reference material on Email Headers for today’s lecture:
Exposing email headers
http://www.abs-comptech.com/EmailHeaders.htm
Tracking the source of email spam
http://www.rahul.net/falk/mailtrack.html
How to read email headers
http://www.emailaddressmanager.com/tips/header.html
Reading the email header
http://antivirus.about.com/od/windowsbasics/a/emailheaders.htm
Reading email headers
http://www.tinhat.com/email/read_email_headers.html
Spamlinks: Reading email headers
http://spamlinks.net/track-trace-headers.htm
ACCC: Reading Email Headers
http://www.uic.edu/depts/accc/newsletter/adn29/headers.html
E-mail Headers and SMTP Commands
http://www.avolio.com/columns/E-mailheaders.html
All About Email Headers
http://www.stopspam.org/index.php?option=com_content&view=article&id=45&Itemid=56
Security Optimization Strategies in the Workplace
http://www.miislita.com/searchito/security-optimization-strategies.html
If you are a student enrolled in the Internet Engineering I graduate course, check the Lecture 7 update.
We will be covering email protocols such as SMTP, POP3, and IMAP. The exercise section covers email headers intelligence and email crawlers.
If you are a student enrolled in the Internet Engineering I graduate course, check the Lecture 6 update.
I will be covering all about DNS configuration files. For the hands-on exercise section, we will be using nslookup commands to snoop at all relevant records of remote Web domains.
Use nslookup/? to access the options helper
Use nslookup followed by ? in a different line to access the commands helper
To quit nslookup, press ctrl C or either type quit or exit.
The following are the lecture and exercise topics covered in the PUPR.edu core graduate course Internet Engineering, Part I. Students enrolled in the course might want to revisit this post as it will be updated.
Lecture 0
History of the Internet & Search Engines
Internet Basics
Lecture 1
RFCs (Request for Comments)
Network Types
IP (Internet Protocol)
Exercise 1 – RFCs, Network types, IP calculations
Lecture 2
OSI Reference Model
ARP
ICMP
Exercise 2 – IP-MAC Mapping, Prompt Commands (arp, ipconfig, nslookup)
Lecture 3
Man-in-the-Middle ARP Attacks
IGMP
IP Packets
Exercise 3 – Broadcast & Multicast IPs, Prompt Commands (netstat, ping, tracert, ipconfig, arp, nslookup)
Lecture 4
Fragmentation Offset
FO Overlapping Attacks
FO Gap Attacks
Tiny FO Attacks
TCP Protocol & Buffers
Exercise 4 – TCP buffers, Congestion Windows, Advertised Windows
Lecture 5
PING
PING of Death
Smurfing
TRACEROUTE-based Intelligence
Exercise 5 – Prompt Commands (arp, ipconfig, nslookup, netstat, ping, tracert)
Lecture 6
BIND & WINDOWS DNS (Domain Name Server)
Internet backbone root servers
Configuration Files
DNS Configuration Errors
Forward Lookup (Zone) Files
Reverse Lookup Files
Exercise 6 – Prompt Commands (interactive/non-interactive nslookup modes)
Lecture 7
SMTP
POP3
IMAP
Email Headers
Exercise 7 – Email Intelligence.
Lecture 8
DNS Intelligence
Using DNS records to understand Virus & Worm Attacks
Network Topology Intelligence from DNS records
Exercise 8 – DNS Intelligence
Lecture 9
General Review
Practice Test
Lecture 10
Final Exam, Oct 27
Course Grading System
8 out of 9 hands-on exercises count (worse exercise grade dropped)
1st partial exam = average of first 4 exercises
2nd partial exam = average of last 4 exercises
These amount to 75% of total grade
Final Exam amounts to 25 % of total grade and it will be curved.
After that, total letter grade will be curved.
Course Letter Grades
A (100-89%)
B (88-77%)
C (76-60%)
D (59-50%)
F (49-0%)
As PUPR students know by now, the AIRWeb and Internet Engineering courses have been consolidated into a single course called Internet Engineering I (IE-I), which is on Tuesday’s.
This was a decision made strictly by the administration. 12 graduate students are enrolled –a big number for a grad course. We are now in the fourth week of IE-I and I can tell that is a lot of fun.
This coming Winter semester I’m scheduled to teach a new grad course called Advanced Search Engine Architecture (ASEA). Both, IE-I and ASEA are hands-on. This means students need to get their hands and feet wet, not just learning the theory.
What we are trying to accomplish in IE-I is to understand how hackers and spammers use Internet architectures at the level of TCP/IP and Search Engines to game the system. I’ll open a special blog category for it during the week.
First lecture (Lecture 1) was briefly summarized in the August 2009 issue of IR Watch. BTW. Tonight’s lecture (Lecture 4) covers the following:
IP Protocol (MAC and IP Mapping)
ICMP Protocol
ARP Hacking Attacks
ICMP Hacking Attacks
Firewall’s Fragmentation Offset Attacks
Meanwhile, ASEA is an expanded version of the previous Search Engine Architecture (SEA) course I’ve taught before. Students interested in registering, can search this blog for the SEA category and check what we have covered in the past. This will give them an idea of what to expect from the Advanced SEA course. One thing I’m planning to do different is to build an inverted index from scratch using AJAX. The most recent version of Terrier will also be used for testing/benchmarking experimentals.
Last but not least, September Issue of IRW will be a bit delayed.
During the Fall of 2009, I will be teaching
Adversarial Information Retrieval on the Web: A Graduate Course on Web Spam and Internet Vulnerabilities
This a new one-full semester graduate course to be offered at Polytechnic University Puerto Rico. It is based on the material presented at the annual AIRWeb Workshops. KDDM graduate students are encouraged to enroll. An early announcement and preliminary syllabus is available at
http://www.miislita.com/courses/airweb-web-spam-syllabus.pdf
BTW, In November 5 of 2008 PUPR became the First Academic Institution in the Caribbean to be Certified by the Committee on National Security Systems (CNSS). Additional information is available at http://www.pupr.edu/ias.html
Their goal is to become a Center of Academic Excellence in Information Assurance Education (CAE/IAE). These are great news. Nationwide, how many universities you know that are in such an exclusive ”club”?
The current issue of IRW, Graduate Students Research, is out. It consists of short abstracts of research conducted by graduate students.
In this issue:
Introduction
Genetic Algorithms, K-Means, and Fuzzy C-Means
Word Association Patterns
U-Site Search Engine Interface
Enhancement of a U-Site Search Engine Interface
News, Research, and Events
Terms of Use and Copyright
The next issue will go back to its how-to mode.
Here is a question I included during the final examination of the Search Engines Architecture course. I am modifying the question. It might serve as a little quiz for non IR readers:
A collection consists of 500 documents. Some documents mention k1 and/or k2 keywords. If 100 mention k1, 200 mention k2, 70 mention k1 and k2, and 25 mention the k1 k2 terms sequence. Calculate the number of results for the following queries first, assuming terms independence and second assuming terms dependence. If the calculation is not possible from the provided data, write NC, ‘Not Computable’.
1. k1 NOT k2
2. k2 NOT k1
3. k1 OR k2 (unconditional OR)
4. k1 OR k2 (conditional OR)
5. NOT k1
6. NOT k2
7. NOT (k1 AND k2)
8. k1 AND k2 NOT (k1 k2)
9. EF-Ratio of the k1 k2 terms sequence
10. c12-index of the k1 k2 terms sequence
11. c12-index of k1 AND k2
12. IDF of k1
13. IDF of k2
14. IDF of k1 AND k2
15. IDF of k1 k2 terms sequence
Total Possible Scores: 15 points for terms independence and 15 points for terms dependence correct results.
Grading Yourself: A (100 – 90), B (89 – 80), C (79 – 70), D (69 -60), F(59 – 0)
Correct answers will be given during the week.
Week 10 Agenda
Lecture Session
Other Inverted Index Architectures
Divide-and-Conquer Strategies for Fast Indexing and Searching
Lab Session
Lectures and Lab Review
Final Examination Notes
Next week we have the final examination. This is an open book exam, with theory and practice sections.
To answer the test you need:
#2 pencil.
Calculator.
Working version of Terrier.
Tools developed during the course: parser, crawler, url and query normalizers, stemmer, etc.
Laptop (or a PC will be supplied to you).
Week 6 Agenda
Lecture Session
Review on Regular Expressions
Lovins, Krovetz, and Porter Stemmers
Xu & Croft Stemmer (Word Co-occurrence-based Stemmers)
The Parallel Stemmer (LSI/Vector Space-based Stemmer)
Lab Session
Building a Stemmer
Comments: Lab 3 is due next week, but if you prefer to turning in earlier is ok. Lab 4 and this lab (Lab 5) are similar in nature. We can negotiate tomorrow in class their deadlines. A reminder that all labs are due in hard and electronic format to get full credit.
More on the topic of college education and search engines:
In the Caribbean we don’t have seasons, just early spring breaks –compared to back in the states. After a 2-week spring break vacation, the new trimester just started. I am teaching tomorrow the Search Engines Architecture Course. This is going to be fun.
BTW. I am putting together a new graduate course for the next fall semester: Search Engines for Penetration Testing. I look forward to any suggestion from colleagues and from former or prospective students. This is going to be a nirvana for ethical hackers/spammers.
Posters at SEOmoz are debating why the Internet is not taught at schools.
One poster claims: “I think all Universities are quite a ways off from this.” Others simply think this will never happen or that if it does, it will not be worth it.
These opinions are understandable, especially when universities have offered courses with “ecommerce”, “web marketing”, “ebusiness” and similar terms in their course titles when most of these are soooo outdated. Many are limited to explaining what is a cookie, bayesian and game theorems, and few other topics that are not really that useful in the real world.
Here is a first hand story. Back in 2002 I was hired by Graduate School of Business of University of Turabo in PR to teach the graduate course ECommerce Technology. It was the first time the course was offered as a core course for students pursuing a master degree. The problem: the syllabus and textbooks were sooo outdated, with case studies of companies that no longer existed. I was forced to redesign the entire course and material.
Here is another first hand story. Many students that took data mining at Polytechnic University of Puerto Rico (PUPR) before I was hired by the university were complaining that they did not learn anything useful because lectures emphasized theory and no practice. This is something I tried to avoid when back in October I started to teach the Web Mining graduate course. It is the same approach I use for my other courses.
As for studying the Internet as SEOmoz posters argue, it is not possible to study “The Internet”. When they say “Internet” probably they refer to studying search marketing, SEO, or Web Analytics. It looks like an opportunity for other marketers to make some money out of their peers’s ignorance.
I know there are some seos already trying to squeeze money from their peers by offering college-type courses dictated by “experts”. Don’t be gamed by these folks. Their “colleges” and “institutes” are not certified by any higher education body, like The Middle States Association, or by research funding organizations like NSF. These mostly look like scams and their diplomas are not worth even the ink these hold.
As for the above claim of teaching SEO in colleges, there is a list of traditional schools already teaching updated web marketing, design, usability, and even accessibility courses. In fact, more and more grad schools are developing Web Mining and Web Marketing courses.
At PUPR I’m in the curriculum development arena, developing and teaching the following hands-on courses, all at the graduate level:
Search Engines Architecture (Spring – classes start next week; lectures and lab)
Web Mining (Winter – semester just ended; lectures only) *
Search Engines for Penetration Testing and Intelligence (Fall – next fall; lectures and lab) **
* This was a course on Web Mining, Business Intelligence, and Search Engines. Agenda and syllabus is available online.
**Just asked by the head of EE&EC and CS Dept to teach this one.
These ARE NOT paperless, online courses. The class meets in the computer lab building. We have plenty of computers and software to play with. I offer all lectures using powerpoint and smartboards. We study which Web business models work and which one don’t. We check case studies from the Web. We dissect SEO myths. We teach why and how search engine algorithms and web analytics work, etc, etc.I have grad students conducting projects or theses supported by grants from gov agencies like DoD, etc. Some of these projects interface with SEO, Web Analytics, Business Intelligence, and Homeland Security.
In addition, we have an upcoming conference on these topics (October). I’m also pushing for a 2-year certificate on Web Marketing & Analytics with a local college.
And how about AIRWeb wherein , as scholars and researchers, we dissect and test search engine spam strategies and find new ways to neutralize, minimize, or “kill” these techniques–many promoted by some among you?
SEOS: Definitely, we are not oblivious to Web marketing and your “world”.
Search Engines Architecture – Lecturer: Dr. Edel Garcia
Code: CECS 7804/31 Special Topics in C&E
Time: Saturday 8:00 A.M. – 12:00 N, Spring 2008
Room: Software Testing Lab, L-210
Academic Calendar: http://www.pupr.edu/academiccalendar/ac-wi05.pdf
Final Examination Date: May 24, 2008
Description: This is a hands-on, one-full semester course on search engines architecture and their algorithms. Each class consists of lecture and lab sessions. Students are expected to build and test their own search engines and related components on a server dedicated for this purpose.
Course Communication: All communications, upgrades/changes to this syllabus to fulfill class needs, answer to questions, clarifications, etc will be made available through this blog. Posts will be indexed in the Search Engine Architecture Course category. Thus, students must read this category on a regular basis. To access the category, just click on the link listed in the Categories section at the right of this blog home page.
Classroom Policies: Taping is not allowed. Lecture Notes are not published online. Students are required to take notes in the traditional way. All lecture material is copyrighted.
Important Note: Students registered in this course automatically receive an invitation to present their projects developed during this course at the Search Engines and Information Security Conference to be held at Polytechnic University, San Juan Campus during October 3 and 4, 2008. Contact Dr. Garcia at admin@miislita.com for additional details or questions regarding the conference.
Target: Students in Business, Engineering, and Computer Sciences and from other disciplines are encouraged to register for this special course.
Requirements: Permission from advisor or department and knowledge of matrix algebra.
Grading: Weekly Lab Reports, Project Presentations, and a Final Exam. The following scoring system will be used:
Course Grade = Ave*(1 – w) + Fin*w
where
Ave = average of all lab reports and group projects. The lowest 2 partial grades are eliminated.
Fin = final exam score
w = an adjustable weight
The letter grade scale is as follows:
A = 100 – 90; B = 89 – 80; C = 79 – 65; D = 64 – 55; F = 54 – 0
Topics: Although not necessarily in this order, some of the topics to be covered, include, but are not limited to the followings:
Linear Algebra Fast Track Tutorial: Brief tutorial on matrix operations with emphasis on vector theory and Singular Value Decomposition
Parser Building: Use of regular expressions to build, test, and use a parser.
Crawler Building: Use of AJAX to build a client-side crawler and a dedicated server-side crawler.
Look-Up Directories: Implementing a look-up directory and pseudo site search tool.
Search Interfaces: Developing and testing search interfaces, their search modes, and advanced search features.
Index and Database Building: Data fragmentation and storing.
Sanitizing and Ranking Answer Sets: Filtering, De-duplicating, and ranking answer set results.
Textbook: There is no official textbook. Open source components will be used or developed by the students. However, the following reference books are recommended for research. Additional references and extended syllabus will be provided in class.
Modern Information Retrieval (Baeza-Yates and Ribeiro-Neto; Addison Wesley).
Information Retrieval – Algorithms and Heuristics (Grossman and Frieder; Springer).
A reader asked me about matrices, so I referred him to my series of Tutorials on Matrices and IR. He then asked me about some special kind of matrices. I answered his questions with some examples. He then replied with some analogies.
IPAM is offering a graduate summer school program called: “Probabilistic Models of Cognition: The Mathematics of Mind” during July 9 – 27, 2007. More information is available at http://ipam.ucla.edu/programs/gss2007/
According to that link and quote: