Archive for the ‘Conferences’ Category

For SEO Spammers: AIRWeb 2008 Presentations

April 29, 2008

To facilitate mainstream dissemination of the manuscripts presented at AIRWeb 2008 here are the papers as listed over at http://airweb.cse.lehigh.edu/2008/program.html

SEO spammers, whether your life gravitates around a “social network circus” or ”link building” or not, it is time to revisit your drawing board.

8:30 - 10:00

10:30 - 12:00

13:30 - 15:00

15:30 - 17:00

  • Web Spam Challenge
    • (5 min.) Description of the challenge
    • (12 min.) Data Analysis School, Moscow slides
      Konstantin Bauman, Alexey Brodskiy, Sergey Kacher, Elmira Kalimulina, Ruslan Kovalev, Mikhail Lebedev, Dmitry Orlov, Pavel Sushin, Pavel Zryumov, Dmitry Leshchiner and Ilya Muchnik
    • (12 min.) Computer and Automation Research Institute, Hungarian Academy of Sciences slides
      David Siklosi, Andras Benczur
    • (12 min.) Institute of Automation, Chinese Academy of Sciences, Beijing slides
      Guanggang Geng, Xiaobo Jin and Chunheng Wang
    • (5 min.) Announcement of results
  • Panel
    • (45 min.): The Future of Adversarial IR on the Web
      Amit Aggarwal, Zoltán Gyöngyi, Alexandros Ntoulas, Erik Selberg, and Andrew Tomkins

Few Rants: Microsoft, a Conference, and a Database Site

April 11, 2008

I normally don’t rant at this blog about trivial stuff in life since this blog is about IR and search engine research. Today I feel like I want to make an exception. So let see how I can tie few rants about silly every-day things to search engines.

Rant 1

I bought the Home and Student version of Windows Office ($122, through Costco). The learning curve started. I tried to open its case by just pulling off the red tab as suggested. The red tab was detached from the case and still there was no “open Sesame”. I then tried different thing until decided to slice the clear seal at the top of the case with a knife and voila! Nothing like a puertorican solution for a “Made in Puerto Rico” Windows Vista product! Duh!

So the recipe is: (1) get a knife, (2) slice seal, and (3) pull with your fingers the case identations toward your right. The inside case should open.

Out of curiousity I wanted to know if others out there struggled with the design of the case. I ended up googling for how to open windows office case and found this site which discussed the very same problem and the very same solution. I realize I was not alone.

There are now dozen of sites like this one that show users this dumb “how-to”. Many are complaining about the “brilliant” design of the box, which is just an usability and accessibility nightmare.

Read what others at the aforementioned site are commenting. Some there commented that ended up searching for:

open office 2007 box
open vista box
Office Packaging “how to open”
open microsoft office box
how to open MS office 2007 box

Something from the product design side is wrong when soooo many have to Google for just how to open the damn case of a Microsoft product, or of any product for that matter. Some thing is wrong when Microsoft lab rats have to explain online how to open the annoying case.

Rant 2

There is a local conference on information security I was invited to. Down the organization pipeline, something is wrong with a conference when their organizers have to chase for potential presenters one week before the event. I pass and wish them good luck.

Rant 3

There is a local company that created a database-driven site for the upcoming Elections. The problem: how to get politicians and average users to know how to use the technology. Also, the site already needs to be redesigned so it can rank high and gain traffic from search engine users.

All these, kind of belong to the Land of Duh.

Demystifying LSI Video

April 7, 2008

Here is a video of my presentation, Demystifying LSI, at the OJOBuscador Congress 2.0, Madrid, Spain, 2007. One year later, nothing has changed. Many of the same crook SEOs exposed during the congress are still deceiving the public about what is LSI.

Unfortunately, the quality of the video and lights are not good enough to see the pdf slides, plus the presentation is in Spanish. Since attendees were not scientists, I talked very slow for over an hour.

Want to get bored for the next hour? View the video.

Thanks to N. Valenzuela Alonso, Director of SEO and Search Engine Marketing of Media Bit, S.L. for the link (www.ithinksearch.com/2008/03/31/video-lsi-de-edel-garcia-desmitificando-lsi/).

Here is also the presentation of Carlos Castillo (Chato), from Yahoo! Research Spain:

Adversarial IR with Web Spam, parts 1 and 2 
(http://www.ojobuscador.com/2007/06/14/ir-con-adversario-y-webspam-videopost/).

I spent great time talking with Carlos, a former grad student of Ricardo Baeza-Yates.

Baeza-Yates, Andrei Broder, Gerald Salton, and Keith van Rijsbergen and few others have helped to shape what is today known as Information Retrieval Research

Talking about Andrei Broder (one of the main researchers behind the old mighty Altavista), here is also a great interview, thanks to ojobuscador site: 
http://www.ojobuscador.com/2006/05/20/entrevista-a-andrei-broder/

 

Understanding Search Engines

April 3, 2008

On April 23rd I’ll be presenting a seminar lecture at University of Puerto Rico, Bayamon (http://www.uprb.edu).

Topic, time, abstract, level, and requirements follows. 

Topic: Understanding Search Engines

Time: 12 Noon

Abstract: What are search engines? How do they work? What are their main components? How do they analyze document relevancy? What it takes to rank a web page in the top 10 search results of Google, Yahoo, and other search engines? These questions will be addressed in this conference.

Level: Beginners.
Requirements: None.

If you are an uprb student, a faculty, or staff and happen to be an SEO or webmaster, don’t miss this rare opportunity to learn answers to these and similar questions.

I will also use that opportunity to promote the upcoming conference we are co-organizing at Polytechnic University:

Search Engines and Information Security

This will be held in October 3 & 4, 2008. Additional information will be provided soon at PUPR.edu and Mi Islita.com sites.

AIRWeb-2008 Last Call for Papers & Extension Deadline

February 25, 2008

AIRWEB organizers have instructed me to disseminate the following Final Call For Papers and deadline extension. Let’s fight the spammers and those disguised as SEOs.

We have extended the submission deadline until March 2, 2008. We would appreciate any assistance in disseminating this extension. The text version of the final call for papers is below and the pdf version is attached.

Best regards.
Carlos Castillo
Kumar Chellapilla
Dennis Fetterly

FINAL CALL FOR PAPERS and 9 day extension
Fourth International Workshop on
Adversarial Information Retrieval on the Web
http://airweb.cse.lehigh.edu/2008/

IMPORTANT DATES

02/March/2008 : Deadline for research articles
31/March/2008 : Deadline for challenge submissions
22/April/2008 : Workshop at the WWW 2008 conference in Beijing, China

Contents:

1. AIRWeb’08 Topics
2. Web Spam Challenge
3. Timeline
4. Organizers and Program Committee

1. AIRWEB’08 TOPICS

Adversarial Information Retrieval addresses tasks such as gathering,
indexing, filtering, retrieving and ranking information from
collections wherein a subset has been manipulated maliciously. On the
Web, the predominant form of such manipulation is “search engine
spamming” or spamdexing, i.e., malicious attempts to influence the
outcome of ranking algorithms, aimed at getting an undeserved high
ranking for some items in the collection.

We solicit both full and short papers on any aspect of adversarial
information retrieval on the Web. Particular areas of interest
include, but are not limited to:

* Link spam
* Content spam
* Cloaking
* Comment spam
* Spam-oriented blogging
* Click fraud detection
* Reverse engineering of ranking algorithms
* Web content filtering
* Advertisement blocking
* Stealth crawling
* Malicious tagging
* Ping spam

Proceedings of the workshop will be included in the ACM Digital
Library. Full papers are limited to 8 pages; work-in progress will be
permitted 4 pages. Papers should be formatted using the WWW2008
proceedings style and submitted via
http://www.easychair.org/conferences/?conf=airweb2008

For more information, see http://airweb.cse.lehigh.edu/2008/

2. WEB SPAM CHALLENGE

Last year we introduced a novel element at the workshop: a Web Spam
Challenge for testing web spam detection systems. We will be holding
the Web Spam Challenge again this year, using the WEBSPAM-UK2007
collection for Web Spam Detection http://www.yr-bcn.es/webspam

The collection includes large set of web pages, a web graph, and
human-provided labels for a set of hosts. We will also provide a set
of features extracted from the contents and links in the collection,
which may be used by the participant teams in addition to any
automatic technique they choose to use.

We ask that participants of the Web Spam Challenge submit predictions
(normal/spam) for all unlabeled hosts in the collection. Predictions
will be evaluated and results will be announced at the AIRWeb 2008
workshop.

For more information, see

3. TIMELINE

- 15 February 2008: E-mail intention to submit a workshop paper
  (optional, but helpful)
- 02 March 2008: Deadline for workshop paper submissions (all day
- 24 March 2008: Notification of acceptance of workshop papers
- 31 March 2008: Challenge submissions due
- 07 April 2008: Camera-ready copy due
- 22 April 2008: Date of workshop

4. ORGANIZERS AND PROGRAM COMMITTEE

Organizers

- Carlos Castillo, Yahoo! Research
- Kumar Chellapilla, Microsoft Live Labs
- Dennis Fetterly, Microsoft Research

Program Committee

- Einat Amitay, IBM
- Andras Benczar, Hungarian Academy of Sciences
- Paul-Alexandru Chiri, Uni Hannover
- James Caverlee, Texas A&M University
- Gordon Cormack, University of Waterloo
- Nick Craswell, Microsoft Research
- Matt Cutts, Google
- Brian Davison, Lehigh University
- Ludovic Denoyer, University Paris 6
- Aaron D’Souza, Google
- Edel Garcia, Mi Islita.com
- Natalie Glance, Nielsen BuzzMetrics
- Antonio Gulli, Ask.com
- Zoltan Gyongyi, Stanford University
- Monika Henzinger, Google
- Pranam Kolari, Yahoo! Applied Research
- Mark Manasse, Microsoft Research
- Marc Najork, Microsoft Research
- Alexandros Ntoulas, Microsoft Search Labs
- Jan Pedersen, Yahoo! Search
- Erik Selberg, Amazon
- Torsten Suel, Polytechnic University
- Mike Thelwall, University of Wolverhampton
- Baoning Wu, Snap
- Tao Yang, Ask.com

Search Engines for Penetration Testing

February 21, 2008

Well, I’m getting ready for my talk this afternoon at University of Turabo. I’ve organized the talk in three parts:

 Part 1: Spam and Fraud through Search Engines

Part 2: Gathering Intelligence through Search Engines

Part 3: Identity Theft through Search Engines

A disclaimer will be necessary to indicate that the information to be presented is for educational purposes, only.

This gonna be a nice one. I hope to see old friends.

Web Mining, Search Engines, and Information Security

February 15, 2008

This thursday the 21st I’ll be presenting before the faculty of University of Turabo, Gurabo, PR the talk:

Web Mining, Search Engines, and Information Security

I hope to see old friends there. Here is the abstract of my talk:

Web Mining is a research area of Data Mining wherein the Web is the “database” and search engines are the “user’s interface”. End-users can resource to search engines for all sorts of things. For instance, marketers can use search engines to gain traffic derived from ranking high Web pages for specific queries, hence enhancing the online presence of businesses, products, and services (search engine optimization, SEO). Spammers can inundate search engine indexes to deceive searchers (spamdexing). Hackers can attempt to rank high documents that lead to security risks (hacketers, hacketering) or use all form of injections (links, forms, scripts, redirections, etc). Terrorists and criminals can use search engines to commit all sort of crime-enabling activities, for instance, by stealing private information like SSNs, passwords, students and users’s IDs, gaining access to “private” documentation, stalking people, etc.

This talk covers these and other aspects of search engines: the Good, the Bad, and the Ugly. The speaker will then talk about his own research projects in the area of Web Mining, Search Engines, and Intelligence. A disclaimer will be necessary to indicate that the information to be presented is for educational purposes only.

ECIR Tutorial on Terrier

February 12, 2008

Craig Macdonald, University of Glasgow, sent me the Call for Participation given below. Looking at the key speakers, I see that the great Amit Singhal, from Google will be there. These days when many marketers are busy trying to manipulate PageRank through nofollow gymnastics, they all miss the bigger picture and the work of the faces behind it.

I wish I can be at ECIR.

ECIR 2008 Tutorial - Researching and building IR applications using Terrier

Description:
This tutorial introduces the main design of a large-scale IR system, and uses the Terrier platform as an example of how one should be built. We detail the architecture and data structures of Terrier, as well as the weighting models included, and describe, with examples, how Terrier can be used to perform experiments and extended to facilitate new research and applications.

Handouts containing slides, a Terrier “crib sheet”, and detailed
examples of implementations of common research problems will be
provided, in addition to a bibliography of informative related papers.

For more details, see http://ecir2008.dcs.gla.ac.uk/tutorial_rb.html.

Organisers:
Craig Macdonald, University of Glasgow, UK
Ben He, University of Glasgow, UK

Registration:
http://ecir2008.dcs.gla.ac.uk/registration.html

ECIR 2008 2nd Call for Papers

February 6, 2008

Dr. Ellen Voorhees, from TREC over at NIST sent me this Call for Papers:

Subject: ECIR 2008 Call For Participation

2nd CALL FOR PARTICIPATION

*Full Programme and Accommodation Details Now Online*
** Early Bird Registration Deadline: 29th February 2008 **

30th European Conference on Information Retrieval (ECIR 200 8) 30th March - 3rd April University of Glasgow
http://ecir2008.dcs.gla.ac.uk/ 

Organized by: University of Glasgow
In cooperation with: BCS-IRSG, ACM SIGIR

* Conference programme (31st March - 2nd April) The technical program will include 33 full research papers, 19 short papers, and 28 posters. Full programme is available at
http://ecir2008.dcs.gla.ac.uk/programme.html     

Tutorials and workshops (30th March)
http://ecir2008.dcs.gla.ac.uk/tutorials_workshops.html    
ECIR 2008 will host three half-day tutorials and three full-day workshops
Tutorial: Advanced language modeling approaches (case study: expert
search) -  Djoerd Hiemstra, University of Twente, The Netherlands
Tutorial: Search and discovery in user-generate text content - Maarten de Rijke and Wouter Weerkamp, ISLA, University of Amsterdam, The Netherlands
Tutorial: Research and building IR applications using Terrier - Craig Macdonald and Ben He, University of Glasgow, UK
Workshop: Workshop on novel methodologies for evaluation in Information Retrieval - organized by Mark Sanderson, University of Sheffield, UK, Martin Braschler, Zurich University of Applied Sciences, Switzerland, Nicola Ferro, University of Padova, Italy, and Julio Gonzalo, UNED, Spain
Workshop: Efficiency issues in Information Retrieval Workshop - organized by Roi Blanco, Universidade da Coruña, Spain and Fabrizio Silvestri, ISTI CNR, Italy
Workshop: Exploiting semantic annotations in Information Retrieval - Omar Alonso, A9.com, USA and Hugo Zaragoza, Yahoo! Research Barcelona, Spain

*Industry day (3rd April)
http://ecir2008.dcs.gla.ac.uk/industry.html     
A day of presentations and discussion dedicated to the interests and needs of Information Retrieval practitioners.

* Keynote speakers
ECIR 2008 is pleased to announce the following three keynote speakers:
Professor Nicholas J. Belkin Rutgers, University, USA - Some(what) Grand Challenges for Information Retrieval Professor Bettina Berendt, K.U. Leuven, Belgium - You are a document
too: Web mining and IR for next-generation information literacy Dr Amit Singhal, Google Research Fellow, Google, USA - Web Search:
Challenges and Directions

* Paper awards
Best paper award, best student paper award and best poster award, all sponsored by Yahoo! Research

* Social events
A social event every day, including a Banquet reception on 31st March at the Kelvingrove Museum and a civic reception at the City Chambers on 1st April. Further details will be posted on the ECIR 2008 Web site.

* Registration
For registration and other information, please visit the ECIR 2008
home page at http://ecir2008.dcs.gla.ac.uk/registration.html   

* Main sponsors
Google  http://www.google.com     
Microsoft Research  http://research.microsoft.com    
Yahoo! Research  http://research.yahoo.com     
MatrixWare Information Services  http://www.matrixware.com    

We are looking forward to welcoming you in Glasgow,

The ECIR 2008 Chairs
(Ian Ruthven, Vassilis Plachouras, Ryen White and Iadh Ounis)

TREC 2008 Call for Papers

December 19, 2007

Ellen Voorhees, TREC Project Manager and Group Manager over at NIST.gov, sent me the TREC 2008 Call for Papers. To facilitate disemination and promote the event, I’m reproducing the Call below. 

CALL FOR PARTICIPATION

TEXT RETRIEVAL CONFERENCE (TREC)

February 2008 - November 2008

Conducted by:
  National Institute of Standards and Technology (NIST)

With support from:
  Intelligence Advanced Research Projects Activity (IARPA)

The Text Retrieval Conference (TREC) workshop series encourages
research in information retrieval and related applications by
providing a large test collection, uniform scoring procedures,
and a forum for organizations interested in comparing their
results. Now in its seventeenth year, the conference has become
the major experimental effort in the field. Participants in
the previous TREC conferences have examined a wide variety
of retrieval techniques and retrieval environments,
including cross-language retrieval, retrieval of web documents,
multimedia retrieval, and question answering. Details about TREC
can be found at the TREC web site, http://trec.nist.gov .

You are invited to participate in TREC 2008. TREC 2008 will
consist of a set of tasks known as “tracks”. Each track focuses
on a particular subproblem or variant of the retrieval task as
described below. Organizations may choose to participate in any or
all of the tracks. For most tracks, training and test materials are
available from NIST; a few tracks will use special collections that
are available from other organizations for a fee.

Dissemination of TREC work and results other than in the (publicly
available) conference proceedings is welcomed, but the conditions of
participation specifically preclude any advertising claims based
on TREC results. All retrieval results submitted to NIST are
published in the Proceedings and are archived on the TREC web site.
The workshop in November is open only to participating groups that
submit retrieval results for at least one track and to selected
government personnel from sponsoring agencies.

Schedule:
——–

By February 21 — submit application described below
  to NIST. Returning an application will add you to
  the active participants’ mailing list. On Feb 25,
  NIST will announce a new password for the “active
  participants” portion of the TREC web site. Included
  in this portion of the web site is information regarding
  the permission forms needed to obtain the TREC document
  disks.

Beginning March 1 — document disks used in some existing
  TREC collections distributed to participants who have
  returned the required forms. Please note that no disks
  will be shipped before March 1.

mid-July–mid-August — results submission deadline for most tracks
  (Results deadline may need to be even earlier for
  some tracks depending on assessor availability.
  Specific deadlines for each track will be included in
  the track guidelines, which will be finalized in the spring.)

September 9 (estimated) — speaker proposals due at NIST.

September 30 (estimated) — relevance judgments and individual
  evaluation scores due back to participants.

Nov 18-21 — TREC 2008 conference at NIST in Gaithersburg, Md. USA

Task Description:
—————-
Below is a brief summary of the tasks. Complete descriptions of
tasks performed in previous years are included in the Overview
papers in each of the TREC proceedings (in the Publications section
of the web site).

The exact definition of the tasks to be performed in each track for
TREC 2008 is still being formulated. Track discussion takes place
on the track mailing list. To be added to a track mailing list,
follow the instructions for contacting that mailing list as
given below. For questions about the track, send mail to the
track coordinator (or post the question to the track mailing list
once you join).

TREC 2008 will have one new track and four continuing tracks.
The new track is the relevance feedback track, a track
that will systematically explore the factors that affect
relevance feedback behavior. The blog, enterprise, legal, and
million query tracks will continue in TREC 2008, though the
specific tasks in a track may differ from year to year.
(Note that the QA track has been moved to the new Text Analysis
Conference (TAC); the call for participation in TAC
will be sent to the TREC friends mailing list.)

Blog Track — The purpose of the blog track is to explore information
  seeking behavior in the blogosphere.

Track coordinator: Iadh Ounis, ounis@cs.gla.ac.uk
  Mailing list: send a mail message to listproc@nist.gov
  such that the body consists of the line
  subscribe trec-blog

Enterprise Track — The purpose of the enterprise track is
  to study enterprise search: satisfying a user who is
  searching the data of an organization to complete some task.

Track coordinators: Nick Craswell, nickcr@microsoft.com
  Ian Soboroff, ian.soboroff@nist.gov
  Arjen de Vries, arjen@acm.org
  Track web page: http://www.ins.cwi.nl/projects/trec-ent
  Mailing list: send a mail message to listproc@nist.gov
  such that the body consists of the line
  subscribe trec-ent

Legal Track — The goal of the legal track is to develop search technology
  that meets the needs of lawyers to engage in effective discovery
  in digital document collections.

Track coordinators: Jason Baron, jason.baron@nara.gov
  Doug Oard, oard@umd.edu
  Track web page: http://trec-legal.umiacs.umd.edu
  Mailing list: contact oard@umd.edu to be added to the list.

Million Query Track — The goal of the “million query” track is to test
  the hypothesis that a test collection built from very many very
  incompletely judged topics is a better tool than a collection built
  using traditional TREC pooling.

Track web page: http://ciir.cs.umass.edu/research/million
  Mailing list: follow the instructions given on the track web page
  to join the email list ( million@cs.umass.edu )

Relevance Feedback Track — The goal of the relevance feedback track
  is to provide a framework for exploring the effects of different
  factors on the success of relevance feedback.

Track coordinators: Chris Buckley, cabuckley@sabir.com
  Steve Robertson, ser@microsoft.com
  Track web page: http://groups.google.com/group/trec-relfeed
  Mailing list: follow the instructions given on the track web page
  to join the email list

Conference Format
————————-

The conference itself will be used as a forum both for presentation
of results (including failure analyses and system comparisons),
and for more lengthy system presentations describing retrieval
techniques used, experiments run using the data, and other issues
of interest to researchers in information retrieval. As there
is a limited amount of time for these presentations, the TREC
program committee will determine which groups are asked to speak
and which groups will present in a poster session. Groups that
are interested in having a speaking slot during the workshop
should submit a 200-300 word abstract in September describing
the experiments they performed. The program committee will use
these abstracts to select speakers.

As some organizations may not wish to describe their proprietary
algorithms, TREC defines two categories of participation.

Category A: Full participation
  Participants will be expected to present full details of system
  algorithms and various experiments run using the data, either in
  a talk or in a poster session.

Category C: Evaluation only
  Participants in this category will be expected to submit results
  for common scoring and tabulation, and present their results in
  a poster session. They will not be expected to describe their
  systems in minute detail, but will be expected to describe the
  general approach and report on time and effort statistics.

Data
—-
Many of the existing TREC English collections (documents, topics,
and relevance judgments) are available for training purposes and
may also be used in some of the tracks. Parts of the training
collection (Disks 1-3) were assembled from Linguistic Data
Consortium (LDC) text, and a signed User Agreement will be required
from all participants. The documents are an assorted collection
of newspapers, newswires, journals, and technical abstracts.
The LDC has collected a more recent set of newswire material called
the AQUAINT collection (Disks 6&7); this collection will also
be available to TREC participants but is covered by a separate
User Agreement. A third Agreement is needed for the remaining
disks (4-5).

All documents are typical of those seen in a real-world situation
(i.e. there will not be arcane vocabulary, but there may be
missing pieces of text or typographical errors). For most tracks,
the relevance judgments against which each system’s output will be
scored will be made by experienced relevance assessors based on the
output of all TREC participants using a pooled relevance methodology.
See the Overview paper in the TREC-8 proceedings (on the TREC
web site) for a detailed discussion of pooling.

Response format and submission details:
—————————————

Organizations wishing to participate in TREC 2008 should respond
to this call for participation by submitting an application.
IMPORTANT NOTE: Participants in previous TRECs who will participate
in TREC 2008 must submit a new application.

An application consists of the following five parts:
1. Contact information:
  * The full name of the main contact person from your organization.

* The full name of your organization. If you are not
  participating as a member of an organization,
  please specify “self”. If you know there is another group
  from your organization that will also participate in TREC 2008
  (for example, two groups from the same university),
  please give enough qualification in the organization name
  to distinguish the different groups.

* An organization/team name (up to 20 characters) used as a
  unique identifier for your group. You will need to use this
  identifier on correspondence to NIST (when requesting
  data or sending permission forms, for example), so
  remember it. This identifier will also be used to tag your
  runs when you submit them.

* Complete organization physical mail address—sufficient
  such that mail sent to that address will be accepted
  by the post office.

* Fully qualified phone and fax numbers for the main contact person.

* Fully qualified, valid email address for the main contact person.

* Exactly ONE fully qualified, valid email address to use
  for the TREC 2008 participants’ mailing list.
  Because of the overhead involved in maintaining the mailing list
  of active participants, only one email address per participating
  group will be added to the TREC 2008 participants’ list.
  We strongly encourage the use of a local mailing list at
  your institution that distributes TREC mail internally to
  project participants so that all involved see the mail sent
  to this list. TREC is run solely be email, so it is
  important that this address be valid and that mail
  sent to the address is read in a timely fashion.
  You may use the email address of the main contact person
  as the address for the mailing list, but please give
  it twice in the application so we are sure of your intentions.

2. Whether you have participated in TREC before. If so, please
  indicate the years you participated, otherwise indicate that you
  are new to TREC.

3. A one-paragraph description of your retrieval approach.

4. Whether you will participate as a Category A or a
  Category C group.

5. A list of tracks that you are likely to participate in.
  This is non-binding, but is helpful to know for planning.

There is no application form as such; a simple text file consisting
of this information by number is the application.
Please respond using only simple text (i.e., no pdf, word,
rtf, postscript, etc. files). We will not process your application
to participate in TREC 2008 unless it is complete.

All responses should be mailed to Lori Buckland, lori.buckland@nist.gov .
Any questions about conference participation, response format, etc.
should be sent to the general TREC email address, trec@nist.gov  .

AIRWeb 2008 Call for Papers

December 18, 2007

Search Engine Spammers, beware:

Here is the Call for Papers for The Fourth International Workshop on Adversarial Information Retrieval on the Web, to be held in April 22nd, 2008 in Beijing, China:

http://airweb.cse.lehigh.edu:80/2008/cfp.html

As in AIRWeb 2007, there will be a Web Spam Challenge. Let’s call it “Ethical Spamming”, a la “Ethical Hacking”. Indeed, to understand the spammer/hacker mentality you need to either act like one under controlled conditions or be one in a previous life, sort of speak. 

Once again, I’ve been appointed member of the Program Committee. To help promote the event, I’m reproducing below their Call for Papers.

Adversarial Information Retrieval addresses tasks such as gathering, indexing, filtering, retrieving and ranking information from collections wherein a subset has been manipulated maliciously. On the Web, the predominant form of such manipulation is “search engine spamming” or spamdexing, i.e., malicious attempts to influence the outcome of ranking algorithms, aimed at getting an undeserved high ranking for some items in the collection.

We solicit both full and short papers on any aspect of adversarial information retrieval on the Web. Particular areas of interest include, but are not limited to:

Link spam
Content spam
Cloaking
Comment spam
Spam-oriented blogging
Click fraud detection
Reverse engineering of ranking algorithms
Web content filtering
Advertisement blocking
Stealth crawling
Malicious tagging

Proceedings of the workshop will be included in the ACM Digital Library. Full papers are limited to 8 pages; work-in progress will be permitted 4 pages.

Web Spam Challenge
Last year we introduced a novel element at the workshop: a Web Spam Challenge for testing web spam detection systems. We will be holding the Web Spam Challenge again this year, using the WEBSPAM-UK2007 collection for Web Spam Detection which we anticipate being released in early January, 2008.

The collection includes large set of web pages, a web graph, and human-provided labels for a set of hosts. We will also provide a set of features extracted from the contents and links in the collection, which may be used by the participant teams in addition to any automatic technique they choose to use.

We ask that participants of the Web Spam Challenge submit predictions (normal/spam) for all unlabeled hosts in the collection. Predictions will be evaluated and results will be announced at the AIRWeb 2008 workshop.

More information will be posted to http://webspam.lip6.fr/  

Timeline
15 February 2008: E-mail intention to submit a workshop paper (optional, but helpful)
22 February 2008: Deadline for workshop paper submissions
14 March 2008: Notification of acceptance of workshop papers
31 March 2008: Camera-ready copy due
31 March 2008: Challenge submissions due
22 April 2008: Date of workshop

Organizers and Program Committee
Organizers:

Carlos Castillo, Yahoo! Research
Kumar Chellapilla, Microsoft Live Labs
Dennis Fetterly, Microsoft Research

Program Committee:

Einat Amitay, IBM
András Benczúr, Hungarian Academy of Sciences
Paul-Alexandru Chiri, Uni Hannover
James Caverlee, Texas A&M University
Gordon Cormack, University of Waterloo
Nick Craswell, Microsoft Research
Matt Cutts, Google
Brian Davison, Lehigh University
Ludovic Denoyer, University Paris 6
Aaron D’Souza, Google
Edel García, Mi Islita.com
Natalie Glance, Nielsen BuzzMetrics
Antonio Gulli, Ask.com
Zoltán Gyöngyi, Stanford University
Monika Henzinger, Google
Pranam Kolari, Yahoo! Applied Research
Mark Manasse, Microsoft Research
Marc Najork, Microsoft Research
Alexandros Ntoulas, Microsoft Search Labs
Jan Pedersen, Yahoo! Search
Erik Selberg, Amazon
Torsten Suel, Polytechnic University
Mike Thelwall, University of Wolverhampton
Baoning Wu, Snap
Tao Yang, Ask.com
E-mail: airweb2008@cse.lehigh.edu

Call for Papers for ACL:HLT Conference

November 30, 2007

Dr. Ellen Voorhees, over at TREC (NIST.GOV), sent me this Call for Papers:

I would like to call the IR research community’s attention to the “ACL 2008: HLT” conference.  This year ACL is making a concerted effort to attract excellent Information Retrieval research papers.  I am serving as PC co-chair for IR, and am helped by Noriko Kando, David Carmel, and Elizabeth Liddy, who are serving as “area chairs” for IR.  We are putting together a group of reviewers who have solid experience in IR, both to ensure that good IR research is recognized and to prevent poor IR research from slipping through.

Please consider submitting your IR research to ACL:HLT this year.  The ACL connection means that there will be some bias toward papers that touch on language technologies such as NLP, speech recognition, machine translation, discourse, and so on.  However, “general” IR papers are entirely within scope, and areas with IR roots or connections are also encouraged: text mining, filtering, recommendation systems, question answering, classification, clustering, sentiment analysis, etc.

The submission deadline for ACL:HLT is January 10, 2008.  ACL will be
held June 15-20 near Ohio State University in Columbus, Ohio.  (The
discount airline Skybus flies there from a large number of places around the US, and guarantees 10 seats for $10 each on every flight.  Of course, they’re probably already taken, but it’s nice to contemplate.)

If you’re considering SIGIR instead or in addition, its submission
deadline is January 28th [abstract due a week earlier].  SIGIR will be
held July 20-24 in Singapore.  SIGIR’s acceptance rate has been running slightly below 20% lately.

The complete call for papers as well as other useful information is
available at http://acl2008.org.
 

IPAM Workshop: Cyber-enabled Discovery & Information

September 26, 2007

Dr. Mark Green, from IPAM at UCLA, sent me  this Call for Workshop:

Dear colleagues,

I am writing to let you know about a 1-day workshop IPAM is holding on October 29 which the NSF has just asked us to organize as part of a coordinated effort by the NSF Mathematics Institutes.  The workshop is intended to aid those interested in writing proposals to understand the initiative better and to help them in finding collaborators.

The NSF is rolling out a major new initiative in late September on “Cyber-enabled Discovery and Innovation.” This will begin as a $50 million dollar program the first year, and will grow over the next 5 years into a $250 million program.

The goal of this workshop is to inform the scientific community about the CDI program, with the aim of eliciting strong proposals involving mathematical scientists. This workshop will be focused on the “knowledge extraction” aspect of the CDI program. For more information about CDI, see

http://www.nsf.gov/news/news_summ.jsp?cntn_id=108366

We plan on having three types of presentations:

–An information session with Q&A with a representative from NSF

–Several panels where each panelist would present a few slides about what they consider to be the interesting and important questions of long-term significance, followed by a discussion with Q&A. Topics we envision at this point are:

(i) Numerical Methods for Fast Knowledge Extraction

(ii) Nonlinear Methods for Dimensional Reduction

(iii) Knowledge Extraction from Images and Problems of Visualization

(iv) Discrete and Graph-based Techniques for Knowledge Extraction and Analysis of Large Networks

–Selected examples of success stories of applying knowledge extraction techniques from the mathematical sciences to large scale problems

For the program webpage and an online application/registration form, see http://www.ipam.ucla.edu/programs/cdi2007/ .

 

Best regards,

Mark

European Conference on Digital Libraries (ECDL 2007)

September 11, 2007

The Computer and Automation Research Institute of the Hungarian Academy of Sciences (MTA SZTAKI) cordially invites you to participate in the 11th ECDL conference to be held in Budapest, Hungary on 16-21 September 2007.

Aims and Scopes:

This unique and well-established series brings together researchers, developers and practitioners working in various disciplines and related areas of digital libraries all over the world. The conference will consist of a three days technical program, preceded by a tutorial day and followed by workshops. The technical program will include refereed paper presentations, plenary sessions, panels and poster sessions. The tutorials will provide in-depth looks at areas of current interest.

ECDL 2007 will be devoted to discussions about hot issues and applications and will primarily provide a forum to reinforce the collaboration of researchers towards the Ubiquitous Digital Libraries.

Topics include, but not limited to:

Digital curation
Theoretical models for digital information management
Personal and personalized digital libraries
Concepts of Digital Libraries and digital content
Collection building, management and integration
System architectures, integration and interoperability
Information organization, search and usage
Multilingual information Access and Multimedia Information Handling
User interfaces for Digital Libraries
User studies and system evaluation
Digital archiving and preservation: methodological, technical and legal issues
Collaboration in DLs
Digital Library applications in e-science, e-learning, e-government, cultural heritage, etc.

For additional information, visit the conference site

ACM Document Engineering 2007 Conference

August 24, 2007

The DocEng 2007 will be held at the University of Manitoba, Winnipeg, Canada from August 28 - 31, 2007.

The ACM Symposium on Document Engineering is an annual international academic conference devoted to the dissemination of research on models, tools and processes that improve our ability to create, manage and maintain documents.

Documents are one of the centerpieces of globally interconnected systems that store information drawn from many media and deliver that information as required by users. A document may be stored in final presentation form or may be generated on-the-fly, undergoing substantial transformations in the process. Documents may include extensive hyperlinks, thereby permitting virtual documents, and also making available structured collections of information on which to anchor automated reasoning, such as promoted through the Semantic Web. Furthermore, document technologies like XML are having a profound impact on data modeling, in part because of the way these technologies bridge and integrate a variety of paradigms.

The attend next week conference or for additional information visit

http://www.cs.umanitoba.ca/~doceng07/

Reviewing Papers: How-To

August 16, 2007

As reviewer of journal manuscripts and conference papers I normally look to see if the piece before me answers the following questions:

1. WHAT-WHY: What is the scientific problem at hand and why is important?
2. WHO-WHAT-WHY: Who proposed what previous solutions and why are these inadequate or incomplete?
3. WHAT-YOUR-WHY: What is your proposed solution and why is better?
4. HOW-WHAT: How is the solution implemented and what are the benefits or practical applications?
5. PROS-CONS-WHAT: What are the possible pros and cons of your solution and what are the next areas of research?

(more…)

Computer Science & Engineering, Berlin

August 15, 2007

The Fourth International Conference on Computer Science and Engineering (CSE 2007) will be held in Berlin, Germany during August 24-26, 2007.

CSE 2007 aims to bring together researchers, scientists, engineers, and students to exchange and share their experiences, new ideas, and research results about all aspects of Computer Science and Engineering, and discuss the practical challenges encountered and the solutions adopted.

(more…)

Argentine Symposium on Artificial Intelligence

August 13, 2007

ASAI, the Argentine Symposium on Artificial Intelligence, is an annual event intended to be the main forum of the Artificial Intelligence (AI) community in Argentina. The symposium aims at providing a forum for researchers and AI community members to discuss and exchange ideas and experiences on diverse topics of AI. Previous ASAI editions stimulated presentations on both applications of AI and new tools and foundations currently under development.

(more…)

2007 SIGIR - The Conference Papers

August 7, 2007

The 30th Annual International ACM SIGIR Conference was over 10 days ago (23-27 July 2007, Amsterdam). I didn’t have time to list the accepted papers/posters/demos. Here they are.

http://www.sigir2007.org has all the glory. While you are there and if you have an account, check Karen Sparck Jones online video.

(more…)

TREC 2008 Call for Papers

August 3, 2007

Dr. Ellen Voorhees, over at TREC, a NIST.gov dependency, sent me the TREC 2008 Call for Papers.

If you are a colleague feel free to submit or forward to your college faculty –across depts/disciplines.

(more…)

August 2007 IR Conferences

July 27, 2007

Here is a list of IR conferences scheduled for August, 2007:

(more…)

Interview with Vint Cerf

July 25, 2007

Omaya Sosa Pascual over at El Nuevo Dia published (07/22/07) an Interview with Vint Cerf, while he was visiting Puerto Rico during the ICANN Public Meeting. As the newspaper, the interview is in Spanish. I have the priviledge of attending the talk Dr. Cerf delivered before the Law School of University of Puerto Rico.

(more…)

Information Retrieval Conferences - July 2007

July 20, 2007

Here is a list of IR conferences, scheduled for the rest of July in The Netherlands.

(more…)

Glasgow Summer School on Multimedia Semantics

July 18, 2007

Glasgow Summer School on Multimedia Semantics, organized by the Information Retrieval Group at Glasgow University, is in full swing now (July 15-21, 2007)

(more…)

Call for Infonortics’s Search Engine Meeting

July 12, 2007

Harry Collier over at Infonortics emailed me yesterday this Call for Papers:

The Call for Papers for next year’s (April 200 8) Search Engine Meeting in Boston has now been released. Offers of presentations are being invited for consideration. Absolute deadline for submission of offers is October 18, 2007 but the organizers stress that, with presentations limited to only 20 over the two days of the meeting, it is advisable to make contact as early as possible.

(more…)

A Week before Greatness

June 29, 2007

I just came from ICANN. Yesterday I attended Paul Twomey and Vint Cerf, Google’s Chief Evangelist presentations at the Law School of University of Puerto Rico. Very inspiring talks. A lot of representatives from ICANN were present.

(more…)

ICANN, Vint Cerf, and Paul Twomey in Puerto Rico

June 22, 2007

Next week is ICANN’s 29th International Public Meeting; 25-29 June, 2007, here in beautiful San Juan, Puerto Rico.

As part of the occasion, I just received an invite from the Law School of University of Puerto Rico to attend special presentations from two of my heroes: Vint Cerf and Paul Twomey.

(more…)

Harvard Law School Internet News and Conferences

June 8, 2007

I have received the May issue of Internet News from the Berkman Center for Internet & Society at Harvard Law School. They have a great list of upcoming conference, which I’m reproducing below. Some of these are relevant to IR, while others are at the intersection of search technologies and Internet Law.

(more…)

Call for Papers to Knowledge Discovery Conference

June 4, 2007

Dr. Ellen Voorhees, Director of TREC, over at NIST.gov informed me by email of this Call for Papers. Over the years, I have received invitations to several TREC tracks and no doubt that the groups that conform these are a great place to be.

For those that want to submit manuscript, here is the full Call:

(more…)

Upcoming IPAM Workshops

May 16, 2007

Dr. Mark Green, Director of the Institute of Pure and Applied Mathematics at UCLA (IPAM) informed me by email of the upcoming workshops IPAM is organizing. I meet Dr. Green last year  during a one-week workshop they organized (The Document Space Workshop)

I am listing below the new workshops relevant to search engines:

(more…)