Archive for the ‘Newsletters’ Category

Centering Data With Excel

July 10, 2009

The QA column of the current issue of IR Watch Newsletter has a great question that might help IR, CS, and stats students.

Q: Centering Data with Excel- In Excel, how do you center a data set?

 A: To center a data set, use the STANDARDIZE function which converts x values into z-scores; i.e.

z = (x – a)/s

where a and s respectively are the population arithmetic mean and standard deviation. The following table emulates an Excel spreadsheet.

 

A

B

C

1

Age, x(A)

Weight, x(W)

Height, x(H)

2

64

57

8

3

71

59

10

4

53

49

6

5

67

62

11

6

55

51

8

7

58

50

7

8

77

55

10

9

57

48

9

10

56

42

10

11

51

42

6

12

76

61

12

13

68

57

9

14

     

15

z(A)

z(W)

z(H)

16

0.14

0.62

-0.44

17

0.92

0.92

0.61

18

-1.09

-0.55

-1.49

19

0.47

1.36

1.14

20

-0.86

-0.26

-0.44

21

-0.53

-0.40

-0.97

22

1.59

0.33

0.61

23

-0.64

-0.70

0.09

24

-0.75

-1.58

0.61

25

-1.31

-1.58

-1.49

26

1.47

1.21

1.67

27

0.58

0.62

0.09

Rows 2 – 13 contains the data set x(A), x(W), and x(H). In rows 16 – 27 the set was centered by typing in cell A16 the formula

 =STANDARDIZE(A2,AVERAGE(A$2:A$13),STDEV(A$2:A$13))

 Pasting this formula in cells A16 through C27 centers the data set. That was easy!

IRW-7-2009: Data Mining Texting

July 6, 2009

data mining texting

The current issue of IRW the newsleter is out.

Featuring Article:

Data Mining Texting
TTMD OMG MOS CU

“My parents send email, I text.” This illustrates the obvious: a digital divide between parents and teens. While parents are busy replying to email or blogging at the most, their kids probably are busy developing their own language to alert their peers when mom or dad is trying to figure out what they are texting about. Did you know that MOS  CU means ‘Mother over shoulder’. ‘See you’. And how about PW CUL? (‘Parents watching. See you later’).

Indeed… Texting is not just for teens:

Texting not only is revolutionizing the way businesses are being conducted in 2009, but is an emerging data mining playground. The number of behavioral patterns in connection with texting is on the rise at different diffusion fronts: from sexting and sextcasting (transmission of conversations, videos, photos with sexual content) to dealing (transmission of conversations in connection with illegal drug activities), to encoding conversations about Wall Street transactions, industrial espionage, and so forth.

Random notes prior to 4th July weekend

July 3, 2009

As the 4th of July weekend approaches, here are some notes before hitting to planet oblivious.

1. Yesterday we had an interesting business entrepreneur meeting with the CIO of the Government of Puerto Rico at El Palacio Rojo, Fortaleza.

2. IRW should be out by Monday. Main article: Data Mining Texting.

3. Only monkeys still believe in KD Myths. Ha, Ha.

Computing Co-Occurrence Matrices with Excel

June 5, 2009

The QA column of the current issue of IR Watch – The Newsletter features the following question:

Question: In Excel, how do you convert a term-document occurrence matrix into a term-term or document-document co-occurrence matrix?

Answer:

Let A be a matrix populated with term occurrences (frequencies).
Let AT be its transpose.

Then, T = AAT is a term-term co-occurrence matrix, and D = ATA is a document-document co-occurrence matrix.

The following table emulates an Excel spreadsheet.

 

A

B

C

D

1  A =

d1

d2

d3

2

t1

0

1

0

3

t2

0

0

1

4

t3

1

1

1

5

 

 

 

 

6

T = AAT

t1

t2

t3

7

t1

1

0

1

8

t2

0

1

1

9

t3

1

1

3

10

 

 

 

 

11

D = ATA

d1

d2

d3

12

d1

1

1

1

13

d2

1

2

1

14

d3

1

1

2

In the table, T was computed by selecting a destination array, entering in its first empty cell (B7) the formula =MMULT(B2:D4,TRANSPOSE(B2:D4)), pressing the f2 key and then the Ctrl+Shift+Enter keys.

Similarly, D was computed by selecting a destination array, entering in its first empty cell (B12) the formula =MMULT(TRANSPOSE(B2:D4),B2:D4), pressing the f2 key and then the Ctrl+Shift+Enter keys.

That was easy!

Note that none of these are similarity matrices. Can you tell why?

IRW-2009-6:Hackers: Taxonomy & Writing Styles

June 1, 2009

hackers

The current issue of IRW should reach subscribers inbox during the day or at the latest, tomorrow.

In this issue:

  • Featuring article: Hackers: Taxonomy and Writing Styles
    Due to the increasing interest in developing Information Retrieval and Data Mining courses at the intersection of Information Security, this issue of the newsletter covers a brief taxonomy on hackers and their writing styles.
  • QA: Excel Matrix Multiplications: How to convert a term-document occurrence matrix into a term-term or document-document co-occurrence matrix?
  • Vacuum Tubes & Transistors Historical
  • Who is Who in IR: Thomas K. Landauer
  • Top CS Departments: Dartmouth College
  • Outstanding Graduate Theses
  • Calls and Events
  • IR Blogs
  • and more…

Vector Normalization with Excel – Part II

May 7, 2009

Back in March, we explained how to normalize column vectors with Excel. But, what about normalizing row vectors? This question is addressed in the current QA column of IRW. I think it might be useful sharing the answer with readers since many of these are students struggling with similar questions. So, here we go.

The following table emulates an Excel array consisting of three columns (A, B, and C) and six rows (1-6).

  A B C
1 1 2 3
2 4 5 6
3 7 8 9
4 0.27 0.53 0.80
5 0.46 0.57 0.68
6 0.50 0.57 0.65

Rows 1, 2, and 3 are row vectors. Rows 4, 5, and 6 are the corresponding normalized vectors, also known as unit vectors because their length is 1. To compute these, do as follows:

1. In cell A4, enter the formula =A1/(SQRT(SUMSQ($A1:$C1))). The result should be as given in this cell.

2. Copy this formula, select cells A5 and A6 and paste the formula in these.

 3. Finally, copy at once cells A4 through A6, select the remaining empty cells of the array, i.e., cells B4 through C6 and paste the formulas in these.

NSA/DHS Designates PUPR as a CAE

May 5, 2009

As blogged yesterday, the current issue of IRW should reach subscribers inbox today. The Top CS Departments column features Polytechnic University of Puerto Rico, where I teach graduate courses. As mentioned few days ago, PUPR has been designated a CAE. This is a great news that is making a splash across academic centers within the U.S., the Caribbean Region and Latin America, and whose mission is research relevant to homeland security.

Associate Director for Computer Science, Dr. Alfredo Cruz, sent me an  official announcement, which I am reproducing.

Polytechnic University of Puerto Rico (PUPR) is Designated National Center of Academic Excellence in Information Assurance Education by NSA and DHS. PUPR was recently designated as a National Center of Academic Excellence in Information Assurance Education (CAE/IAE) by the National Security Agency (NSA) and the Department of Homeland Security (DHS) on April 22, 2009. The goal of these centers is to reduce the vulnerability of the national information infrastructure by promoting higher education and research in Information Assurance (IA) and Security through the development of a growing number of professionals with IA expertise in various related disciplines. PUPR will be recognized as the first institution in Puerto Rico to be designated as a CAE/IAE on June 3, 2009 in Seattle, Washington. Dr. Alfredo Cruz from the Department of Electrical & Computer Engineering and Computer Science will be present to receive the designation. He is the Director of the Center of Information Assurance for Research and Education (CIARE) at PUPR. Dr. Cruz is the person responsible for this designation. PUPR is of the very few Hispanic serving institution (HSI) in the Nation to receive this designation, and to become one of the first 100 institutions nationwide; this is a very special recognition. This designation requires that the President of the United States send the Governor of Puerto Rico a certification that should be handed to the president of PUPR designating the Institution as a CAE/IAE at a National level. The Congress and all the respective Congressional Committees are also notified.

Some of the benefits of the CAE/IAE designation are:
• PUPR will receive formal recognition from the U.S. Government as well as opportunities for prestige and publicity for our roll in securing the Nation’s information systems.
• This designation increases collaboration opportunities between designated and aspiring institutions at local and national levels. This includes internships, faculty and student exchange, research, and publications, among other activities.
• With this designation as a CAE/IAE PUPR can obtain scholarships that can help outstanding students to pursue graduate studies in IA, enabling them to work with the Federal Government or other federal institutions and agencies.
• PUPR can compete and benefit from proposal calls (RFP) that are specifically for designated CAE/IAE institutions. These proposals offer millions of dollars from the DoD, NSF, NSA and “Homeland Security”, among others, for research and infrastructure.
• Student scholarships offered under the NSF’s Scholarship for Service (SFS) program. The SFS scholarship offers the following:
–2-year scholarship, includes 8K stipend (12K for graduate students), plus tuition and nominal room and board expenses.
–Paid summer internship in a federal agency.
–Placement in federal government at the end of the scholarship period.

IRW: RIA Vulnerabilities

May 4, 2009

The current of issue of IRW should reach subscribers inbox tomorrow.

In this issue:

Featuring article: RIA Vulnerabilities

This issue of the newsletter discusses how hackers might be exploiting Web vulnerabilities found in Rich Internet Applications (RIAs). As mentioned in our previous issue, some RIAs are based on Adobe’s technologies like Flash, Flex, or AIR. Some are designed to be run online or offline. Their rising popularity has attracted developers and marketers, and -as expected- hackers and spammers.

QA: Excel Vector Normalization: How do I convert a row vector into a unit vector?
Who is Who in IR: C.J. van Rijsbergen
Top CS Departments: Polytechnic University of Puerto Rico
Historical Notes: ENIAC Computer
Outstanding Graduate Theses
Calls and Events
Research Blogs
and more…

Hackers Hit Pentagon

April 22, 2009

It happened again: Thanks to Web vulnerabilities, hackers were able to hit the Pentagon. 

According to CCN (http://www.cnn.com/2009/US/04/21/pentagon.hacked/),

Thousands of confidential files on the U.S. military’s most technologically advanced fighter aircraft have been compromised by unknown computer hackers over the past two years, according to senior defense officials.

The Internet intruders were able to gain access to data related to the design and electronics systems of the Joint Strike Fighter through computers of Pentagon contractors in charge of designing and building the aircraft, according to the officials, who did not want to be identified because of the sensitivity of the issue.

In addition to files relating to the aircraft, hackers gained entry into the Air Force’s air traffic control systems, according to the officials. Once they got in, the Internet hackers were able to see such information as the locations of U.S. military aircraft in flight.

This news is quite relevant to my Fall 2009 Web Vulnerability graduate course (http://www.miislita.com/courses/airweb-web-spam-syllabus.pdf)

BTW. Associate Director of the CS Department at PUPR.edu, also a colleague and friend, Dr. Alfredo Cruz, called me two days ago with some great news: The department has been accredited for 2009-2014 as a National Center of Academic Excellence in Information Assurance Education. Soon they will be listed with members of this exclusive “club” in the National Securing Agency web site (http://www.nsa.gov/ia/academic_outreach/nat_cae/institutions.shtml)

An official press release and formal presentation before the pertinent authorities is being coordinated for within the next few weeks or so.

The next issue of IR Watch – The Newsletter provides additional coverage of such an exciting news.

I have tied these two news in a single post to underscore the need for IR/data mining courses at the intersection of Information Security, which is precisely the mission statement of IRW, reaching now more than 300 investigators/research centers.

IRW Newsletter: Web & Data Mining with RIAs

April 8, 2009

RIAs

The current issue of IRW should be in subscribers inbox today or tomorrow, at the latest.

In this issue of the newsletter we cover Rich Internet Applications (RIAs) and how these can be used for Web/Data Mining. A RIA is a browser-independent application that can be compiled and run from the desktop.

In this issue:

Featuring article: Web & Data Mining with RIAs
QA: Recommended RIAs
Who is Who in IR: Bruce Croft
Top CS Departments: UMass, Amherst
Historical Notes: John von Neumann and Bugs
Outstanding Graduate Theses
Calls and Events
Research Blogs
and more…

IRW currently reaches a fine audience of university and government researchers and their labs. If you are a graduate student or IR practitioner and want to be known within this exclusive circle, submit a short article (2, 3 pages, IRW format, free from marketing and sale pitches) for its consideration

RSJ-PM: Probabilistic Model Tutorial

March 30, 2009

As promised, I am pleased to announce the publication of the Robertson-Sparck Jones Probabilistic Model Tutorial.

It is available in Mi Islita.com in the Tutorials Section. A link is provided in the index page.

The tutorial guides you through the intricasies of RSJ-PM. It is a great start for CS students and teachers interested in probabilistic models in information retrieval.

Enjoy it.

Due to the time spent on it, the April issue of the IR Watch newsletter will be a bit delayed.

Vector Normalization with Excel

March 4, 2009

Unit vectors are frequently used in information retrieval and data mining studies because simplify further calculations and analyses.

In the current issue of IR Watch, we show how easy is to convert column vectors into unit vectors with Excel. It is assumed you know how to define spreadsheet arrays in Excel and how to enter formulas in it.

Say we have two vectors in columns A and B each with four elements. To convert these into unit vectors, do this:

1. In cell C1, enter the formula =A1/(SQRT(SUMSQ(A$1:A$4)))

2. Paste content of C1 into cell D1. This creates a modified instance of this formula.

3. Paste content of C1 and  D1 cells into remaining empty cells of these columns by selecting these at once. This also creates modified instances of these formulas.

C and D columns represent the unit vectors.

A figure with a step-by-step example is given in IRW (free subscription)

Below is another example, but with the final results.

A B C D
1 8 0.13 0.36
2 10 0.26 0.45
4 12 0.53 0.53
6 14 0.79 0.62

That was easy!

If you use the first row to label columns, as in this example, be sure to readjust the formulas so these start at cell 2 and run up to cell 5.

If you still have questions on how to do this, email me or subscribe to IRW.

IRW-March-2009:Data Mining Dates

March 2, 2009

data mining dates

The current issue of the IRW newsletter is available now.

In this issue:

Featuring article: Data Mining Dates

QA: Excel Vector Normalization

Who is Who in IR: Stephen Robertson

Top CS Departments: School of Informatics, City University, London

Historical Notes: Mark and Colossus Computers

Outstanding Graduate Theses

Calls and Events

Research Blogs

and more…

The abstract of the featuring article is given below.

In this issue of the newsletter we examine the extraction of intelligence from dates. At first, a discussion on dates seems an unnecessary exercise. After all, many are inclined to take dates at face-value. But a date is more than a one-liner of information extracted from a calendar, headline, or footer. In the intelligence community, for example, dates provide a great amount of information about events, people, organized crime, terrorism, money laundering, unexpected situations, accidents, plots, chains of custody, validations, etc. Indeed, a date is a unique form of metadata, not to mention that these can be either relative or absolute. They can also be part of encryption schemes.

IRW: Data Mining Credit Cards

February 2, 2009

data-mining-credit-cards1

The current issue of IR Watch – The Newsletter will be available during the day. It consists of the following sections.

Featuring Article: Data Mining Credit Cards

In this issue of the newsletter we cover Luhn’s Algorithm, also known as the Modulus 10 or Mod-10 Test. This algorithm is used for data mining and validation of credit cards. Credit cards fraud is a topic that never goes away.

QA: Types of Links

What is the difference between in-links, out-links, co-citation, and co-reference?

Historical Notes: The Whirlwind Project

Top CS: State University of New Jersey, Rutgers

Who is Who in IR: Tefko Saracevic

Graduate Theses

Data Mining Blogs

and more.

Tons of Credit Card Transactions Exposed at HPY

January 22, 2009

We learned about this news from a business associate:

According to USAToday, Heartland Payment Systems (HPY) on Tuesday disclosed that intruders hacked into the computers it uses to process 100 million payment card transactions per month for 175,000 merchants.

In IRW – The Newsletter, we have covered data mining of VINs, SSNs, web analytic frauds, and email headers. It might be time to cover credit card mining so readers will understand the risks involved when servers, even test servers, are not properly secured or supervised.

Data Mining at the intersection of Information Retrieval, Business Intelligence, and Information Security is here to stay.

Matrix Multiplication in Excel

January 7, 2009

The QA section of the current issue of our IRW Newsletter has a practical piece of knowledge.

Question

In Excel, how do you multiply any two matrices M1 and M2 to get a third one M3?

Answer

I assume you know how to define an array in Excel (i.e., the first and last cell of a selected rectangular region defines an array).

Thus, let M1 be a matrix of r1 rows and c1 columns and M2 be a matrix of r2 rows and c2 columns. Let r1 = c2. Multiplying M1 times M2 results in a squared matrix M3 of r1 rows and c2 columns; i.e.

M1 * M2 = M3

To carry out this operation in EXCEL, do this:

1. Open a spreadsheet and enter M1 and M2 numerical data. Next, select an array region (r1 x c2) by dragging your mouse on the spreadsheet. This will be M3.

2. Type = MMULT(Array1, Array2) in the (fx) field. Next, Press F2.

3. Finally, press Ctrl + Shift + Enter and witness the magical black box of EXCEL.

An example is provided in Figure 1 of the newsletter.

From time to time, more complex math operations will be described in future issues of IRW.

Data Mining Email Headers Part II

January 6, 2009

This is a follow up of yesterday’s post. The following trick, discussed in IRW newsletter, helps you to mining email headers from even automatic responders and failed delivery emails.

The trick is to read email headers from the bottom up. The last “Received” is more trusted than the others which are forgeable. The line corresponds to the original sender.

Here is a technique discussed in IRW that you can use to identify which headers are inserted by your ISP. Send an email to yourself through your ISP account and check the email headers of both documents. If your ISP is not using an SMTP proxy masquerade, chances it might leak the name of the workstation used to create the email in the HELO command along with your ISP name, IP, and possibly other interesting information.

Armed with this information, analyze the headers of emails you receive from automatic responders and “failed to deliver” email messages. Now you know which headers are from your ISP and which are inserted as the email traveled from servers to servers before reaching your inbox.

For instance, I recently got an automatic response wherein the HELO says

Received: from unknown (HELO UKMAIL.sportex.com) (213.86.197.130)
by server-3.tower-157.messagelabs.com with SMTP; 6 Jan 2009 05:04:59 -0000

An IP lookup reveals additional information. The rest of the headers also leaks interesting stuff.

Data Mining Email Headers

January 5, 2009

The featuring article of IRW explains how to access, read, and interpret email headers. Several techniques for tracking down spammers are also disclosed.

We show whether your ISP or email client might be adding headers that unnecessarily disclose important information like the name of the machine used to send an email, your isp name and IP, your email vendor, which antivirus software your isp might be using, etc.

For instance, this morning I received the following unsolicited emai asking for a link exchange:

Hi,

My name is David Stern, and I am contacting you on behalf of our client ***

*** is London’s most exclusive personal training and therapy centre.

I have visited your site and see that your site is sufficiently related to their domain. It would be great if we can have website *** linked to yours. In lieu of this link, we will provide a link back from one of our best directories and from same Google PageRank page.

The email headers identify in the HELO command the sender’s local machine. I’m disabling the link using asterisks.

Received: from [122.162.66.40] (helo=smtp.net4india.com)
by smtp.net4india.com with smtp (Exim 4.66) <*a href=*mailto:linkmanager@business-onlinedirectory.com”>linkmanager@business-onlinedirectory.com<*/a>)

If HELO is not present, there are plenty of data mining techniques to use.

Sneak Preview of IRW 2009-2-1

January 2, 2009

Data Mining Email Headers

This is a sneak preview of the current issue of IRW The Newsletter.

In this Issue:

Featuring article: Data Mining Email Headers – and Senders and ISPs
Questions/Answers: EXCEL Matrix Multiplications
Who is Who in IR: William S. Cooper
Top CS Departments: UC, Berkeley
Historical Notes: ABC and Z3 Computers
Outstanding Graduate Theses
Calls and Events
Research Blogs
and more…

Featuring article abstract:

In this issue of the newsletter we reproduce written material relevant to the  mining of email headers, senders, and ISPs. Spammers, viruses, and hackers love to fake email headers as these provide setting information that might serve as entry points around which a strategy could be crafted. Intelligence data miners also love email headers. Since these can be faked, they can be used to encode hidden messages. Actually, email steganography is an exciting area.

More on IRW

December 4, 2008

The current issue of IRW also covers:

  1. Henry Freiser’s Pointer Function for visualizing all real roots of a polynomial.
  2. Vanevar Bush’s first computers and Bell’s CNC.
  3. Cyril Cleverdon: the IR Father of Precision and Recall.
  4. More graduate students CS/IR Theses.
  5. MIT’s CS Department.
  6. More IR blogs.
  7. Call for Papers.

Search Engines and SSNs

December 3, 2008

 In the current issue of IRW we explain why facilitating social security numbers (SSNs) online is an enabling crime; one that is relevant to Homeland Security (1). We show that, ironically, government agencies and universities are the first facilitators of SSNs on the Web.

We examined how crafting smart queries in Google and other search engines allows users to find incidents wherein SSNs have been released for the entire world to see online. Althought nothing new, it is a widespread problem across the Web. It is a shame when administrators of the above two offenders (government and university dependencies) ignore the problem or justify it in the name of what is practical.

We show why the common practice of facilitating the last four digits of a SSN is a very bad idea. With SSN Allocation tables, we can map the first three digits to the region wherein the SSN application was filed, by US State and territory. If the last four digits are known, only the middle two digits need to be guessed. Identity thieves and stalkers might be having a field day.

There is still hope, though. We cover how Northern Michigan University (2) and John Hopkins University (3) are proactively becoming part of the solution and not part of the problem. In the case of NMU, they have published a one year case study outlining the full eradication of SSNs as identifiers from NMU campus.

 References

1. The Homeland Security and Terrorism Threat: From Document Fraud, Identity Theft and Social Security Number Misuse
http://finance.senate.gov/hearings/testimony/2003test/091003pctest.pdf
2. Full Eradication of Social Security Number as an Identifier
http://net.educause.edu/ir/library/pdf/EDU04144.pdf
3. Policy on Social Security Number Protection and Use
http://education.jhu.edu/catalog/academic-policies/policy-on-ssn-protection-and-use/

IRW Sneak Preview: Identity Thefts through Search Engines

December 1, 2008

identity thefts

This is a sneak preview of IR Watch. In this issue the main article, Identity Thefts through Search Engines, covers quite old and well known incidents wherein social security numbers have been released for the entire world to see. These are accessible through search engines.

Although not a new problem, facilitating a SSN, even a portion of it, has been labeled as an enabling crime. This is a must-read topic for those conducting data mining and web mining at the intersection of information assurance and homeland security.

Ironically, the biggest offenders are government agencies and universities.

During the week, we will blog on other sections of the newsletter.

Outstanding IR Theses

November 18, 2008

Thank you to all new researchers that have signed to receive IRW, now in its new format. The following theses are listed in the Outstanding Graduate Theses column:

DNIDS: A Dependable Network Intrusion Detection System Using the CSI-KNN Algorithm
PDF

A Hybrid Knowledge-based/Content-based Recommender System in the Bluejay Genome Browser
PDF

Exploitation of Redundant Inverse Term Frequency for Answer Extraction
PDF

Improving the effectiveness of information retrieval with genetic programming
PDF

BTW our next issue is closed and will go out by the first of December. The featuring article is about identity theft through search engines.

IRW Sneak Preview: Fraudulent Web Analytics

October 31, 2008

Fraudulent Web Analytics

This post is the monthly sneak preview of the next issue of IR Watch Newsletter, now in its new format.

In this issue, the featuring article pretends to raise awareness on some of the schemes used to defraud those that make business decisions based on Web Analytics. If you are an advertiser or investor, you must read it. Don’t be gamed by unethical marketers and spammers. 

The article exposes how some marketers/spammers engineer the fraud by gaming the wisdom of crowds. We expose how traffic fraud, click-through injections, and form injections are used within viral networks to produce bogus Web Analytics advertisers might be paying for or using to make critical decisions.

The Question of the Month column is dedicated to precision vs. recall.

In the Who is Who in IR section, the late Karen Sparck Jones is featured.

In the Top CS Departments, the CS Dept of Stanford University is featured.

We have a new column dedicated to historical notes on computers, search engines, and IR. In the current column, Hewlett-Packard origins are highlighted.

Last, but not least, more IR blogs and graduate theses are listed.

Now, some great news! Please keep reading.

We are currently in negotiations with a local university to co-launch an interesting start-up at the intersection of IR, search engines, and business research.

The way we see it, a bad Economy presents opportunities. The time is right for such a unique project.

Getting Ready for AIRWeb2009

October 13, 2008

For the last few years I have served as PC member of AIRWeb. I just received and accepted invitation to be a PC for AIRWeb 2009.

For those of you not familiar with, the International Workshop on Adversarial Information Retrieval on the Web (AIRWeb) http://airweb.cse.lehigh.edu/ has been held four times: in conjunction with the WWW’05, SIGIR’06, WWW’07, and WWW’08.

Topics discussed at the workshops include all forms of search engine spamming and hacking practices. SEO spamming practices are exposed and countermeasures are tested. It is a lot of fun examining in advance manuscript describing these malicous practices, months before the accepted papers hit mainstream.

Incidentally, the next issue of the IR Watch newsletter features Fraudulent Web Analytics, an article on adversarial techniques. We expose several practices spammers and hackers use to produce fake analytics and to defraud advertisers.

Why IRW

October 3, 2008

Thank you to those that have asked for back issues of the IRW newsletter. Titles will be listed in chronological order soon and made available to subscribers.

The new version of IRW has been released.

Why you should subscribe to IRW

Subscribe to IRW and be able to read or publish short articles on any topic at the intersection of information retrieval, data mining, business intelligence, and information security.

We look for how-to manuscripts; i.e., for articles that explain in transparent ways how heuristic concepts or algorithms work and why these are important. Build a reputation as a computer science how-to expert. Network within a selective circle of researchers.

By subscribing you will also be able to:

• Get answers or address research questions.
• Nominate the most influential IR scientists.
• Nominate top computer science departments.
• Recommend outstanding CS graduate theses.
• Announce upcoming or ongoing calls and events.
• Recommend your favorite IR/DM blogs.

Not a subscriber? See what you have been missing:

IRW

Sneak Preview of IRW: Data Mining VINs

October 1, 2008

data mining vins

Issue 1 of the new version of IRW is out. It should be in subscribers inbox today or at the latest tomorrow.

In this issue:

Featuring Article: Data Mining VINs
Question of the Month: What are some of the most common typos in data entry?
Who is Who in IR: Gerard Salton
CS Departments: Cornell University
Theses (by authors)
Calls and Events (by organizations)
Blogs (by categories)

Goodbye IR Watch?

September 23, 2008

Goodbye IRW. That’s right. After few years, IR Watch – The Newsletter has come to an end. Thank you for the memories.

Well, not exactly. :)

Welcome to the new version of IR Watch.

As mentioned in previous issues of IRW, we have redesigned the newsletter to conform to our changing times and needs. Instead of a single, long article about an IR topic, the new version consists of short sections. The main section is reserved for a two-page article about a topic at the interface of IR, data mining, and intelligence.

The other sections are intended to be powered by readers. This means that readers of IRW, IR Thoughts, and Mi Islita.com can submit material for inclusion in the newsletter. Three type of submissions are available:

One-page articles
Nominations and Recommendations
Event Announcements

The following sections are available

Featuring Article – Main article on IR and data mining. Submit one.
Who is Who in IR – Nominate your favorite IR scientists with a short bio.
CS Departments – Feature your favorite Computer Science department.
Theses – For students, by students. Recommend a graduate thesis on IR or Data Mining.
Calls and Events – This section is reserved for call for papers and event announcements.
Blogs – For bloggers, by bloggers. Recommend an IR or DM blog for inclusion in the following categories: blog directories, organization blogs, researcher blogs, student blogs.

All these informational services are free.

We might be testing other sections as well. As its predecessor, subscriptions are free of charge, at least for the time being. The first issue of the new IRW goes out in October 1, 2008.

Over the years, readers have requested back issues. To cover operational costs, we have been forced to institute a $20 fee for any back issue request. Back issues of the old IRW are also available for the same price.

Another feature of the newsletter is the announcement of illustrated tutorials on IR and DM in …(shocking!) EXCEL. The spreadsheet templates do all the math, so tutorial users won’t have to.

Stay tuned for additional details, because there are more reasons for subscribing to IRW.

IPAM Upcoming Workshops

September 9, 2008

IPAM (Institute of Pure and Applied Mathematics at UCLA) sent us the current schedule for the upcoming workshop seminars. Back in January, 2006 we attended the now famous Document Space Workshop and the experience there was a real nirvana. We had the opportunity of meeting its then director, Dr. Mark Green and few other world class researchers like Dr. Michael Berry, an expert in LSI.

IPAM now has a new director and associate director. According to them:

Dr. Russel Caflisch, UCLA professor of mathematics, was appointed as IPAM Director on July 1, 2008. Dr. Jichun Li, an associate professor of mathematics at University of Nevada Las Vegas, joined the IPAM scientific staff in August; he will serve a two-year term as one of IPAM’s Associate Directors, along with Dr. Christian Ratsch.  Please help us welcome Dr. Caflisch and Dr. Li to IPAM!”

We highly recommend our readers that can to attend the IPAM workshops. For those interested in attending, the current schedule of events is given below.

Upcoming IPAM Long Programs:

 Each IPAM long program will involve a community of senior and junior researchers. The intent is for long-term participants to have an opportunity to learn about the topic of the program from the perspectives of many different fields and to meet a diverse group of people and have an opportunity to form new collaborations. In addition to these activities, there will be opening tutorials, four workshops (each one is listed under “upcoming workshops”), and a culminating workshop at Lake Arrowhead. Funding is available both to attend our entire 3-month program and to attend individual workshops; those interested are encouraged to apply through the website of the program that interests them.  Applications received at least six weeks in advance of the long program will receive fullest consideration.

 Internet Multi-Resolution Analysis

September 8 – December 12, 2008

http://www.ipam.ucla.edu/programs/mra2008/

 Quantum and Kinetic Transport Equations: Analysis, Computations, and New Applications

March 9 – June 12, 2009

http://www.ipam.ucla.edu/programs/kt2009/

Combinatorics: Methods and Applications in Mathematics and Computer Science

September 8 – December 11, 2009

http://www.ipam.ucla.edu/programs/cma2009/

Model and Data Hierarchies for Simulating and Understanding Climate

March 8 – June 11, 2010

Webpage will be posted soon.

Upcoming IPAM Workshops (through December 2009):

A registration form and an application for funding are available on each program’s webpage.  Applications received six weeks in advance of the workshop will receive fullest consideration.

Internet MRA Tutorials

September 9 – 12, 2008

http://www.ipam.ucla.edu/programs/mratut/

Multiscale Representation, Analysis and Modeling of Internet Data and Measurements

September 22 – 26, 2008

http://www.ipam.ucla.edu/programs/mraws1/

Applications of Internet MRA to Cyber-Security

October 13 – 17, 2008

http://www.ipam.ucla.edu/programs/mraws2/

Beyond Internet MRA: Networks of Networks

November 3 – 7, 2008

http://www.ipam.ucla.edu/programs/mraws3/

New Mathematical Frontiers in Network Multi-Resolution Analysis

November 17 – 21, 2008

http://www.ipam.ucla.edu/programs/mraws4/

Quantitative and Computational Aspects of Metric Geometry

January 12 – 16, 2009

http://www.ipam.ucla.edu/programs/mg2009/

Numerical Approaches to Quantum Many-Body Systems

January 22 – 30, 2009

(Three-day tutorials followed by five-day workshop)

http://www.ipam.ucla.edu/programs/qs2009/

Laplacian Eigenvalues and Eigenfunctions: Theory, Computation, Application

February 9 – 13, 2009

http://www.ipam.ucla.edu/programs/le2009/

Rare Events in High-Dimensional Systems

February 23 – 27, 2009

http://www.ipam.ucla.edu/programs/re2009/

Quantum and Kinetic Transport Equations: Tutorials

March 10 – 13, 2009

http://www.ipam.ucla.edu/programs/kttut/

Computational Kinetic Transport and Hybrid Methods

March 30 – April 3, 2009

http://www.ipam.ucla.edu/programs/ktws1/

The Boltzmann Equation: DiPerna-Lions Plus 20 Years

April 15 – 17, 2009

http://www.ipam.ucla.edu/programs/ktws2/

Flows and Networks in Complex Media

April 27 – May 1, 2009

http://www.ipam.ucla.edu/programs/ktws3/

Asymptotic Methods for Dissipative Particle Systems

May 18 – 22, 2009

http://www.ipam.ucla.edu/programs/ktws4/

Combinatorics Tutorials

September 9 – 16, 2009

Webpage will be posted soon.

Probabilistic Techniques and Applications

October 5 – 9, 2009

http://www.ipam.ucla.edu/programs/cmaws1/

Combinatorial Geometry

October 19 – 23, 2009

http://www.ipam.ucla.edu/programs/cmaws2/

Topics in Graphs and Hypergraphs

November 2 – 6, 2009

http://www.ipam.ucla.edu/programs/cmaws3/

Analytical Methods in Combinatorics, Additive Number Theory and Computer Science

November 16 – 20, 2009

http://www.ipam.ucla.edu/programs/cmaws4/

Complimentary version of IR Watch

June 26, 2008

By now subscribers should have the current issue of IR Watch – The Newsletter in their inbox.

Non subscribers: A complimentary version with few minor changes is available online at http://www.miislita.com. Take advantage of this freebie while you can. Let others know about IRW and what they are missing.

IRW might be subject to few changes in the future.