Archive for the ‘Conferences’ Category

FIRE: Forum for Information Retrieval Evaluation

October 14, 2009

Ellen Voorhees, Director of TREC at NIST.Gov sent me this Call for Participation, reproduced below to facilitate its dissemination:

CALL FOR PARTICIPATION

FIRE
(Forum for Information Retrieval Evaluation)
Workshop
DAIICT, Gandhinagar, India
19-21 February 2010

http://www.isical.ac.in/~fire

The success of TREC, CLEF, and NTCIR has clearly established the importance
of building reusable, large-scale standard test collections in Information
Access research. The aim of FIRE is to encourage research in Indian language
Information Access by creating a similar platform for Indian languages that
provides the data and a common forum for comparing models and techniques.

The Tasks:
==========
1) Ad-hoc monolingual document retrieval in Bengali, Hindi and Marathi.

2) Ad-hoc cross-lingual document retrieval
- documents in Bengali, Hindi, Marathi, and English,
- queries in Bengali, Hindi, Marathi, Tamil, Telugu and English.
- Bengali and Hindi topics will also be transliterated and made available
in Roman script. Adhoc monolingual task participants are encouraged to
submit runs using these queries as well.

3) Retrieval and classification from mailing lists and forums.
This is a pilot task being offered by IBM India Research Lab.

4)  Ad-hoc Wikipedia-entity retrieval from news documents
- Entities mined from English Wikipedia
- Query documents from English news website
This is a pilot task being offered by Yahoo! Labs, Bangalore.

Important Dates:
================
Ad-hoc monolingual and cross-lingual document retrieval:
Training data release   Aug 15 ‘09
Test data release       Nov 01 ‘09
Adhoc run submission    Nov 25 ‘09
Results released        Feb 01 ‘10

Retrieval and classification from mailing lists and forums:
Training data release   Oct 16 ‘09
Test data release       Nov 01 ‘09
Run submission          Nov 25 ‘09
Results declared        Feb 01 ‘10

Ad-hoc Wikipedia-entity retrieval from news documents:
Training data release   Oct 15 ‘09
Test data release       Nov 01 ‘09
Run submission          Nov 25 ‘09
Results declared        Feb 01 ‘10

Task Co-ordinators:
===================
Ad-hoc retrieval:
Pushpak Bhattacharyya (pb@cse.iitb.ac.in)
IIT Bombay
Dipasree Pal (dipasree_t@isical.ac.in)
ISI Kolkata

Retrieval and classification from mailing lists and forums:
Debapriyo Majumdar (debapriyo@in.ibm.com)
IBM India Research Lab
Ayan Bandyopadhyay (ayan_t@isical.ac.in)
ISI Kolkata

Ad-hoc Wikipedia-entity retrieval from news documents:
Ashwin Tengli (ashwint@yahoo-inc.com)
Yahoo! Labs, Bangalore
Pabitra Mitra (pabitra@cse.iitkgp.ernet.in)
IIT Kharagpur

Overall co-ordinators:
Prasenjit Majumder (p_majumder@daiict.ac.in)
DAIICT, Gandhinagar
Mandar Mitra (mandar@isical.ac.in)
ISI Kolkata

International Advisory Committee for FIRE:
==========================================
Amit Singhal, Google Fellow, USA
Carol Peters, ISTI-CNR, Italy
Christian Fluhr, CEA, France
Donna Harman, National Institute of Standards and Technology, USA
Doug Oard, University of Maryland, USA
Ee Peng Lim, Nanyang Technological University, Singapore
Ellen Voorhees, National Institute of Standards and Technology, USA
Fabrizio Sebastiani, ISTI-CNR, Italy
Gareth Jones, Dublin City University, Ireland.
Hsin-Hsi Chen, National Taiwan University, Taipei, Taiwan
Hwee Tou Ng, National University of Singapore, Singapore
Iadh Ounis, University of Glasgow, UK
Ian Soboroff, National Institute of Standards and Technology, USA
Jacques Savoy, University of Neuchatel, Switzerland
James Allan, University of Massachusetts Amherst, USA
Krishna Kummamuru, IBM Research Lab, India
Mark Sanderson, University of Sheffield, UK
Mun Kew Leong, Institute for Infocomm Research, Singapore
Norbert Fuhr, University of Duisburg, Germany
Noriko Kando, National Institute of Informatics, Japan
Paul McNamee, Johns Hopkins University, USA
Prabhakar Raghavan, Yahoo! Research Labs, USA
Ricardo Baeza-Yates, Yahoo! Research Labs, Spain
Stephen Robertson, Microsoft Research, Cambridge, UK
Sung Hyon Myaeng, KAIST, South Korea
Tat-Seng Chua, National University of Singapore, Singapore
Tetsuya Sakai, Microsoft Research Asia, Beijing

IR Videos in Spanish

June 22, 2009

I normally do not put online my lecture notes (ppt, pdf, videos). However, there are two public conferences that event organizers taped. Both last over 1 hour and are in Spanish, but with slides in English. Here are the links. The quality of the videos is so-so.

Since the videos were made available few months later after the events, these are not properly dated. I have included below the actual date of the events. If you don’t know Spanish, you are out of luck.

1. Understanding Search Engines (Entendiendo a los Buscadores), University of Puerto Rico, Bayamon, 4-23-2008

http://video.google.com/videoplay?docid=-653964730907023811

This one last for about two hours. The audience consisted of grad students and researchers. Unfortunately, the video has an audio-visual mismatch of about one slide. If you can coupe with this, I hope you like it.

2. Demystifying LSI (Desmitificando LSI)- OJOBuscador Congress, Madrid, Spain, 3-09-2007.

http://www.ojotube.com/videos/congreso-ojobuscador-2007-ponencia-desmitificando-lsi-de-dr-e-garcia/

This one last for over one hour. Since it was for a non-scientific audience  (most Spanish SEOs)  I tried to talk very slow.

W3C 2009 Conference

March 26, 2009

Here is the final list conforming the 18 International Conference of the W3C, WWW2009, of which AIRWeb2009 is a workshop.

http://www.webshine.org/2009reg.html

A lot of good stuff to please IRs, CS students, spammers/SEOs, and hackers.

SIDIM XXIV Conference

March 5, 2009

I am presenting at The Seminario Interuniversitario de Investigación en Ciencias Matemáticas (Interuniversity Seminar on Mathematical Sciences Research, SIDIM).

This is one of the most important activities held in Puerto Rico for the promotion of Mathematics research. (http://sidim2009.uprr.pr/)

This year SIDIM will be held at University of Puerto Rico, Rio Piedras in March 6-7, 2009. The SIDIM program and book of abstracts  is available at http://sidim.uprh.edu/libroSIDIM2009.pdf

I will be presenting new research work on IDF and a new model for the conditional specificity of terms. If you have followed previous posts on the topic of inverse document frequency, now you will understand why I have dissected the topic several times. Thank you all for your private comments and feedback on the topic.

My abstract follows:

Scaled Inverse Document Frequency: A Model for the Evaluation of the Conditional Specificity of Query Terms in Search Engine Collections

Edel Garcia, Internet Business Development Center, Interamerican University of Puerto Rico, Metropolitan Campus

Inverse document frequency (IDF) is a measure of the specificity of query terms over a collection of D number of documents that has been successfully incorporated into numerous vector space information retrieval models. Since these models assume term independence, the specificity of a given term, present in different queries, is assumed to be unique and independent from other query terms. To the best of our knowledge, there are no known models that condition the specificity of terms to the presence of other terms in a query.

This paper proposes a new measure called scaled inverse document frequency (SIDF) which evaluates the conditional specificity of query terms over a subset S of D and without making any assumption about term independence. S can be estimated from search results, OR searches, or computed from inverted index data. We have evaluated SIDF values from commercial search engines by submitting queries relevant to the financial investment domain. Results compare favorably across search engines and queries. Our approach has practical applications for `real-world’ scenarios like in Web Mining, Homeland Security, and keyword-driven marketing research scenarios. SIDF can be incorporated into a variety of information retrieval models as a global weight scoring system.

Keywords: inverse document frequency, conditional term specificity, web mining, search engines

AND 2009 Conference

February 17, 2009

L. Venkata Subramaniam, PhD, Manager – Information Processing and Analytics, IBM India Research Lab http://lvs004.googlepages.com sent us email with some great news. They are having the Third Workshop on Analytics for Noisy Unstructured Text Data (AND) on July 23-24, 2009, at Barcelona, Spain and asked us to disseminate the news.

Copy of the email follows.

——————–

Dear Edel,

We are organizing the Third Workshop on Analytics for Noisy Unstructured Text Data on July 23-24, 2009, at Barcelona, Spain.

We know you work in related areas and would be happy to have you submit your research work to this workshop.

Also I request you to add AND 2009 to the blog you are maintaining at: http://irthoughts.wordpress.com/

I know many IR researchers visit your blog and through the blog we will be able to reach.

AND 2009: http://and2009workshop.googlepages.com/

This is the third in the series of workshops: AND 2007 at IJCAI 2007: http://research.ihost.com/and2007/

AND 2008 at SIGIR 2008:
http://and2008workshop.googlepages.com/

Both earlier workshops resulted in ACM proceedings and journal special issues. Here are some details of AND 09:
http://and2009workshop.googlepages.com/

Workshop Name: Third Workshop on Analytics for Noisy Unstructured Text Data (AND 09) in conjunction with ICDAR 09

Submission Date: 20 April 2009
Notification Date: 20 May 2009
Workshop Dates: 23-24 July 2009 Workshop
Location: Barcelona Spain

Regards
Venkat

————————-

So now you know. Start making plans for attending this great workshop. If you are visiting Madrid, swing by to Barcelona for a few days. Don’t miss this unique opportunity.

AIRWeb2009 Call for Papers

January 8, 2009

Last night I received the following email from the organizing committee of AIRWeb2009 asking to disseminate the event:

Dear Edel,

Thank you again for agreeing to serve on the AIRWeb program committee. We have attached the AIRWeb CFP to this message and would appreciate your assistance in publicizing the workshop. The CFP is also available from the AIRWeb website – http://airweb.cse.lehigh.edu/2009/ .

Best regards,
Dennis Fetterly and Zoltan Gyongyi

This is my third year as a PC Member of AIRWeb. It is a lot of fun reviewing manuscripts to be presented at the event, months before the new anti-spamdexing and anti-adversarial IR practices are disseminated to the general public. Some, spammers like to wait and follow what comes out of AIRWeb to then try workarounds. This is a continuos arm race and cat-mouse chase.

So, this post goes as follows:

AIRWeb is a series of international workshops focusing on Adversarial Information Retrieval on the Web that brings together both researchers and industry practitioners, to present and discuss advances in the state of the art.

AIRWeb’09 will be co-located with the WWW2009 conference in Madrid, Spain. The workshop proceedings will be made available in the ACM Digital Library.

Important Dates

6 February 2009: Deadline (optional, but helpful) for abstract submissions

13 February 2009: Deadline for paper submissions
20 or 21

April 2009: Date of the workshop

Incidentally, I am observing another new wave of spammers, marketers, and johnny-comes-late talking in “IR tongues” to gain some credibility from easy to impress folks. Ironically their audience mostly consists of their peer SEOs. I guess the fight against spammers disguised as marketers never ends.

I wish I can beat the crap out of all these self-proclaimed SEO “experts” every single day through this blog. Fortunately, I have better things to do like conducting research, advicing students, writing IRW The Newsletter, preparing a paper for SIDIM 2009, peer reviewing IR manuscripts, and (reality checks-to-pay-bills) taking on enterprise projects.

PC + Digital TV + Search Engines

November 17, 2008

I’ll be today presenting at Interamerican University on search engine technologies and on a research project. I plan to cover how the PC + Digital TV fusion will provide an interesting platform for search engine marketing and research in general.

This is going to be fun.

Getting Ready for AIRWeb2009

October 13, 2008

For the last few years I have served as PC member of AIRWeb. I just received and accepted invitation to be a PC for AIRWeb 2009.

For those of you not familiar with, the International Workshop on Adversarial Information Retrieval on the Web (AIRWeb) http://airweb.cse.lehigh.edu/ has been held four times: in conjunction with the WWW’05, SIGIR’06, WWW’07, and WWW’08.

Topics discussed at the workshops include all forms of search engine spamming and hacking practices. SEO spamming practices are exposed and countermeasures are tested. It is a lot of fun examining in advance manuscript describing these malicous practices, months before the accepted papers hit mainstream.

Incidentally, the next issue of the IR Watch newsletter features Fraudulent Web Analytics, an article on adversarial techniques. We expose several practices spammers and hackers use to produce fake analytics and to defraud advertisers.

IPAM Upcoming Workshops

September 9, 2008

IPAM (Institute of Pure and Applied Mathematics at UCLA) sent us the current schedule for the upcoming workshop seminars. Back in January, 2006 we attended the now famous Document Space Workshop and the experience there was a real nirvana. We had the opportunity of meeting its then director, Dr. Mark Green and few other world class researchers like Dr. Michael Berry, an expert in LSI.

IPAM now has a new director and associate director. According to them:

Dr. Russel Caflisch, UCLA professor of mathematics, was appointed as IPAM Director on July 1, 2008. Dr. Jichun Li, an associate professor of mathematics at University of Nevada Las Vegas, joined the IPAM scientific staff in August; he will serve a two-year term as one of IPAM’s Associate Directors, along with Dr. Christian Ratsch.  Please help us welcome Dr. Caflisch and Dr. Li to IPAM!”

We highly recommend our readers that can to attend the IPAM workshops. For those interested in attending, the current schedule of events is given below.

Upcoming IPAM Long Programs:

 Each IPAM long program will involve a community of senior and junior researchers. The intent is for long-term participants to have an opportunity to learn about the topic of the program from the perspectives of many different fields and to meet a diverse group of people and have an opportunity to form new collaborations. In addition to these activities, there will be opening tutorials, four workshops (each one is listed under “upcoming workshops”), and a culminating workshop at Lake Arrowhead. Funding is available both to attend our entire 3-month program and to attend individual workshops; those interested are encouraged to apply through the website of the program that interests them.  Applications received at least six weeks in advance of the long program will receive fullest consideration.

 Internet Multi-Resolution Analysis

September 8 – December 12, 2008

http://www.ipam.ucla.edu/programs/mra2008/

 Quantum and Kinetic Transport Equations: Analysis, Computations, and New Applications

March 9 – June 12, 2009

http://www.ipam.ucla.edu/programs/kt2009/

Combinatorics: Methods and Applications in Mathematics and Computer Science

September 8 – December 11, 2009

http://www.ipam.ucla.edu/programs/cma2009/

Model and Data Hierarchies for Simulating and Understanding Climate

March 8 – June 11, 2010

Webpage will be posted soon.

Upcoming IPAM Workshops (through December 2009):

A registration form and an application for funding are available on each program’s webpage.  Applications received six weeks in advance of the workshop will receive fullest consideration.

Internet MRA Tutorials

September 9 – 12, 2008

http://www.ipam.ucla.edu/programs/mratut/

Multiscale Representation, Analysis and Modeling of Internet Data and Measurements

September 22 – 26, 2008

http://www.ipam.ucla.edu/programs/mraws1/

Applications of Internet MRA to Cyber-Security

October 13 – 17, 2008

http://www.ipam.ucla.edu/programs/mraws2/

Beyond Internet MRA: Networks of Networks

November 3 – 7, 2008

http://www.ipam.ucla.edu/programs/mraws3/

New Mathematical Frontiers in Network Multi-Resolution Analysis

November 17 – 21, 2008

http://www.ipam.ucla.edu/programs/mraws4/

Quantitative and Computational Aspects of Metric Geometry

January 12 – 16, 2009

http://www.ipam.ucla.edu/programs/mg2009/

Numerical Approaches to Quantum Many-Body Systems

January 22 – 30, 2009

(Three-day tutorials followed by five-day workshop)

http://www.ipam.ucla.edu/programs/qs2009/

Laplacian Eigenvalues and Eigenfunctions: Theory, Computation, Application

February 9 – 13, 2009

http://www.ipam.ucla.edu/programs/le2009/

Rare Events in High-Dimensional Systems

February 23 – 27, 2009

http://www.ipam.ucla.edu/programs/re2009/

Quantum and Kinetic Transport Equations: Tutorials

March 10 – 13, 2009

http://www.ipam.ucla.edu/programs/kttut/

Computational Kinetic Transport and Hybrid Methods

March 30 – April 3, 2009

http://www.ipam.ucla.edu/programs/ktws1/

The Boltzmann Equation: DiPerna-Lions Plus 20 Years

April 15 – 17, 2009

http://www.ipam.ucla.edu/programs/ktws2/

Flows and Networks in Complex Media

April 27 – May 1, 2009

http://www.ipam.ucla.edu/programs/ktws3/

Asymptotic Methods for Dissipative Particle Systems

May 18 – 22, 2009

http://www.ipam.ucla.edu/programs/ktws4/

Combinatorics Tutorials

September 9 – 16, 2009

Webpage will be posted soon.

Probabilistic Techniques and Applications

October 5 – 9, 2009

http://www.ipam.ucla.edu/programs/cmaws1/

Combinatorial Geometry

October 19 – 23, 2009

http://www.ipam.ucla.edu/programs/cmaws2/

Topics in Graphs and Hypergraphs

November 2 – 6, 2009

http://www.ipam.ucla.edu/programs/cmaws3/

Analytical Methods in Combinatorics, Additive Number Theory and Computer Science

November 16 – 20, 2009

http://www.ipam.ucla.edu/programs/cmaws4/

TREC 2009 Track Proposals

August 22, 2008

I received yesterday an email from Dr. Ellen Voorhees, Chair, TREC Program Committee, informing us of the new track proposal for TREC 2009. For those not familiar with TREC, as part of NIST, is where proponents gather to explore new IR and search technologies, retrieval models, frameworks, etc.

Sorry, but SEO non sense or hearsays are not allowed at TREC.

For additional information, visit http://trec.nist.gov/overview.html.

To disseminate the good news to our IR audience, I am reproducing the call below.

Dear TREC community,

TREC uses a track proposal mechanism to select the set of tracks to be run in a given TREC. We are currently soliciting proposals for tracks to include in TREC 2009. All candidate tracks (both existing and newly proposed) must submit a proposal by September 15, 2008. The submission deadline is in mid-September so that the TREC program committee will have time to make track selections before the TREC 2008 meeting in November. This allows the track discussions held at the TREC meeting to be informed as to the status of the track for the following year.

The criteria for judging a track proposal are as before: a strong advocate who is willing to be the track coordinator (track coordinator is a volunteer position); a large enough core of interested researchers to make the track viable; the availability of sufficient resources such as appropriate corpora and assessors with expertise in the area; and the fit with other tracks.

Proposals need to contain enough information for the PC to assess the criteria above. Proposals should contain an explicit statement of the goals of the track (i.e., what is expected to be learned and/or what infrastructure would be created if the track were run). If relevance judging (or some similar sort of annotation) is required, the proposal needs to include where the judging would occur (NIST or elsewhere?), any special qualifications the assessors would need (special domain expertise required?), as well as an estimate of the amount of time such assessing would will require. Any special constraints on the document sets needed should also be noted. Finally, proposals must contain full contact details of the proposer.

Proposals should be sent as either a postscript, PDF, or ASCII document to trec@nist.gov.

Ellen Voorhees
Chair, TREC program committee

For SEO Spammers: AIRWeb 2008 Presentations

April 29, 2008

To facilitate mainstream dissemination of the manuscripts presented at AIRWeb 2008 here are the papers as listed over at http://airweb.cse.lehigh.edu/2008/program.html

SEO spammers, whether your life gravitates around a “social network circus” or ”link building” or not, it is time to revisit your drawing board.

8:30 – 10:00

10:30 – 12:00

13:30 – 15:00

15:30 – 17:00

  • Web Spam Challenge
    • (5 min.) Description of the challenge
    • (12 min.) Data Analysis School, Moscow slides
      Konstantin Bauman, Alexey Brodskiy, Sergey Kacher, Elmira Kalimulina, Ruslan Kovalev, Mikhail Lebedev, Dmitry Orlov, Pavel Sushin, Pavel Zryumov, Dmitry Leshchiner and Ilya Muchnik
    • (12 min.) Computer and Automation Research Institute, Hungarian Academy of Sciences slides
      David Siklosi, Andras Benczur
    • (12 min.) Institute of Automation, Chinese Academy of Sciences, Beijing slides
      Guanggang Geng, Xiaobo Jin and Chunheng Wang
    • (5 min.) Announcement of results
  • Panel
    • (45 min.): The Future of Adversarial IR on the Web
      Amit Aggarwal, Zoltán Gyöngyi, Alexandros Ntoulas, Erik Selberg, and Andrew Tomkins

Few Rants: Microsoft, a Conference, and a Database Site

April 11, 2008

I normally don’t rant at this blog about trivial stuff in life since this blog is about IR and search engine research. Today I feel like I want to make an exception. So let see how I can tie few rants about silly every-day things to search engines.

Rant 1

I bought the Home and Student version of Windows Office ($122, through Costco). The learning curve started. I tried to open its case by just pulling off the red tab as suggested. The red tab was detached from the case and still there was no “open Sesame”. I then tried different thing until decided to slice the clear seal at the top of the case with a knife and voila! Nothing like a puertorican solution for a “Made in Puerto Rico” Windows Vista product! Duh!

So the recipe is: (1) get a knife, (2) slice seal, and (3) pull with your fingers the case identations toward your right. The inside case should open.

Out of curiousity I wanted to know if others out there struggled with the design of the case. I ended up googling for how to open windows office case and found this site which discussed the very same problem and the very same solution. I realize I was not alone.

There are now dozen of sites like this one that show users this dumb “how-to”. Many are complaining about the “brilliant” design of the box, which is just an usability and accessibility nightmare.

Read what others at the aforementioned site are commenting. Some there commented that ended up searching for:

open office 2007 box
open vista box
Office Packaging “how to open”
open microsoft office box
how to open MS office 2007 box

Something from the product design side is wrong when soooo many have to Google for just how to open the damn case of a Microsoft product, or of any product for that matter. Some thing is wrong when Microsoft lab rats have to explain online how to open the annoying case.

Rant 2

There is a local conference on information security I was invited to. Down the organization pipeline, something is wrong with a conference when their organizers have to chase for potential presenters one week before the event. I pass and wish them good luck.

Rant 3

There is a local company that created a database-driven site for the upcoming Elections. The problem: how to get politicians and average users to know how to use the technology. Also, the site already needs to be redesigned so it can rank high and gain traffic from search engine users.

All these, kind of belong to the Land of Duh.

Demystifying LSI Video

April 7, 2008

Here is a video of my presentation, Demystifying LSI, at the OJOBuscador Congress 2.0, Madrid, Spain, 2007. One year later, nothing has changed. Many of the same crook SEOs exposed during the congress are still deceiving the public about what is LSI.

Unfortunately, the quality of the video and lights are not good enough to see the pdf slides, plus the presentation is in Spanish. Since attendees were not scientists, I talked very slow for over an hour.

Want to get bored for the next hour? View the video.

Thanks to N. Valenzuela Alonso, Director of SEO and Search Engine Marketing of Media Bit, S.L. for the link (www.ithinksearch.com/2008/03/31/video-lsi-de-edel-garcia-desmitificando-lsi/).

Here is also the presentation of Carlos Castillo (Chato), from Yahoo! Research Spain:

Adversarial IR with Web Spam, parts 1 and 2 
(http://www.ojobuscador.com/2007/06/14/ir-con-adversario-y-webspam-videopost/).

I spent great time talking with Carlos, a former grad student of Ricardo Baeza-Yates.

Baeza-Yates, Andrei Broder, Gerald Salton, and Keith van Rijsbergen and few others have helped to shape what is today known as Information Retrieval Research

Talking about Andrei Broder (one of the main researchers behind the old mighty Altavista), here is also a great interview, thanks to ojobuscador site: 
http://www.ojobuscador.com/2006/05/20/entrevista-a-andrei-broder/

 

Understanding Search Engines

April 3, 2008

On April 23rd I’ll be presenting a seminar lecture at University of Puerto Rico, Bayamon (http://www.uprb.edu).

Topic, time, abstract, level, and requirements follows. 

Topic: Understanding Search Engines

Time: 12 Noon

Abstract: What are search engines? How do they work? What are their main components? How do they analyze document relevancy? What it takes to rank a web page in the top 10 search results of Google, Yahoo, and other search engines? These questions will be addressed in this conference.

Level: Beginners.
Requirements: None.

If you are an uprb student, a faculty, or staff and happen to be an SEO or webmaster, don’t miss this rare opportunity to learn answers to these and similar questions.

I will also use that opportunity to promote the upcoming conference we are co-organizing at Polytechnic University:

Search Engines and Information Security

This will be held in October 3 & 4, 2008. Additional information will be provided soon at PUPR.edu and Mi Islita.com sites.

AIRWeb-2008 Last Call for Papers & Extension Deadline

February 25, 2008

AIRWEB organizers have instructed me to disseminate the following Final Call For Papers and deadline extension. Let’s fight the spammers and those disguised as SEOs.

We have extended the submission deadline until March 2, 2008. We would appreciate any assistance in disseminating this extension. The text version of the final call for papers is below and the pdf version is attached.

Best regards.
Carlos Castillo
Kumar Chellapilla
Dennis Fetterly

FINAL CALL FOR PAPERS and 9 day extension
Fourth International Workshop on
Adversarial Information Retrieval on the Web
http://airweb.cse.lehigh.edu/2008/

IMPORTANT DATES

02/March/2008 : Deadline for research articles
31/March/2008 : Deadline for challenge submissions
22/April/2008 : Workshop at the WWW 2008 conference in Beijing, China

Contents:

1. AIRWeb’08 Topics
2. Web Spam Challenge
3. Timeline
4. Organizers and Program Committee

1. AIRWEB’08 TOPICS

Adversarial Information Retrieval addresses tasks such as gathering,
indexing, filtering, retrieving and ranking information from
collections wherein a subset has been manipulated maliciously. On the
Web, the predominant form of such manipulation is “search engine
spamming” or spamdexing, i.e., malicious attempts to influence the
outcome of ranking algorithms, aimed at getting an undeserved high
ranking for some items in the collection.

We solicit both full and short papers on any aspect of adversarial
information retrieval on the Web. Particular areas of interest
include, but are not limited to:

* Link spam
* Content spam
* Cloaking
* Comment spam
* Spam-oriented blogging
* Click fraud detection
* Reverse engineering of ranking algorithms
* Web content filtering
* Advertisement blocking
* Stealth crawling
* Malicious tagging
* Ping spam

Proceedings of the workshop will be included in the ACM Digital
Library. Full papers are limited to 8 pages; work-in progress will be
permitted 4 pages. Papers should be formatted using the WWW2008
proceedings style and submitted via
http://www.easychair.org/conferences/?conf=airweb2008

For more information, see http://airweb.cse.lehigh.edu/2008/

2. WEB SPAM CHALLENGE

Last year we introduced a novel element at the workshop: a Web Spam
Challenge for testing web spam detection systems. We will be holding
the Web Spam Challenge again this year, using the WEBSPAM-UK2007
collection for Web Spam Detection http://www.yr-bcn.es/webspam

The collection includes large set of web pages, a web graph, and
human-provided labels for a set of hosts. We will also provide a set
of features extracted from the contents and links in the collection,
which may be used by the participant teams in addition to any
automatic technique they choose to use.

We ask that participants of the Web Spam Challenge submit predictions
(normal/spam) for all unlabeled hosts in the collection. Predictions
will be evaluated and results will be announced at the AIRWeb 2008
workshop.

For more information, see

3. TIMELINE

- 15 February 2008: E-mail intention to submit a workshop paper
  (optional, but helpful)
- 02 March 2008: Deadline for workshop paper submissions (all day
- 24 March 2008: Notification of acceptance of workshop papers
- 31 March 2008: Challenge submissions due
- 07 April 2008: Camera-ready copy due
- 22 April 2008: Date of workshop

4. ORGANIZERS AND PROGRAM COMMITTEE

Organizers

- Carlos Castillo, Yahoo! Research
- Kumar Chellapilla, Microsoft Live Labs
- Dennis Fetterly, Microsoft Research

Program Committee

- Einat Amitay, IBM
- Andras Benczar, Hungarian Academy of Sciences
- Paul-Alexandru Chiri, Uni Hannover
- James Caverlee, Texas A&M University
- Gordon Cormack, University of Waterloo
- Nick Craswell, Microsoft Research
- Matt Cutts, Google
- Brian Davison, Lehigh University
- Ludovic Denoyer, University Paris 6
- Aaron D’Souza, Google
- Edel Garcia, Mi Islita.com
- Natalie Glance, Nielsen BuzzMetrics
- Antonio Gulli, Ask.com
- Zoltan Gyongyi, Stanford University
- Monika Henzinger, Google
- Pranam Kolari, Yahoo! Applied Research
- Mark Manasse, Microsoft Research
- Marc Najork, Microsoft Research
- Alexandros Ntoulas, Microsoft Search Labs
- Jan Pedersen, Yahoo! Search
- Erik Selberg, Amazon
- Torsten Suel, Polytechnic University
- Mike Thelwall, University of Wolverhampton
- Baoning Wu, Snap
- Tao Yang, Ask.com

Search Engines for Penetration Testing

February 21, 2008

Well, I’m getting ready for my talk this afternoon at University of Turabo. I’ve organized the talk in three parts:

 Part 1: Spam and Fraud through Search Engines

Part 2: Gathering Intelligence through Search Engines

Part 3: Identity Theft through Search Engines

A disclaimer will be necessary to indicate that the information to be presented is for educational purposes, only.

This gonna be a nice one. I hope to see old friends.

Web Mining, Search Engines, and Information Security

February 15, 2008

This thursday the 21st I’ll be presenting before the faculty of University of Turabo, Gurabo, PR the talk:

Web Mining, Search Engines, and Information Security

I hope to see old friends there. Here is the abstract of my talk:

Web Mining is a research area of Data Mining wherein the Web is the “database” and search engines are the “user’s interface”. End-users can resource to search engines for all sorts of things. For instance, marketers can use search engines to gain traffic derived from ranking high Web pages for specific queries, hence enhancing the online presence of businesses, products, and services (search engine optimization, SEO). Spammers can inundate search engine indexes to deceive searchers (spamdexing). Hackers can attempt to rank high documents that lead to security risks (hacketers, hacketering) or use all form of injections (links, forms, scripts, redirections, etc). Terrorists and criminals can use search engines to commit all sort of crime-enabling activities, for instance, by stealing private information like SSNs, passwords, students and users’s IDs, gaining access to “private” documentation, stalking people, etc.

This talk covers these and other aspects of search engines: the Good, the Bad, and the Ugly. The speaker will then talk about his own research projects in the area of Web Mining, Search Engines, and Intelligence. A disclaimer will be necessary to indicate that the information to be presented is for educational purposes only.

ECIR Tutorial on Terrier

February 12, 2008

Craig Macdonald, University of Glasgow, sent me the Call for Participation given below. Looking at the key speakers, I see that the great Amit Singhal, from Google will be there. These days when many marketers are busy trying to manipulate PageRank through nofollow gymnastics, they all miss the bigger picture and the work of the faces behind it.

I wish I can be at ECIR.

ECIR 2008 Tutorial – Researching and building IR applications using Terrier

Description:
This tutorial introduces the main design of a large-scale IR system, and uses the Terrier platform as an example of how one should be built. We detail the architecture and data structures of Terrier, as well as the weighting models included, and describe, with examples, how Terrier can be used to perform experiments and extended to facilitate new research and applications.

Handouts containing slides, a Terrier “crib sheet”, and detailed
examples of implementations of common research problems will be
provided, in addition to a bibliography of informative related papers.

For more details, see http://ecir2008.dcs.gla.ac.uk/tutorial_rb.html.

Organisers:
Craig Macdonald, University of Glasgow, UK
Ben He, University of Glasgow, UK

Registration:
http://ecir2008.dcs.gla.ac.uk/registration.html

ECIR 2008 2nd Call for Papers

February 6, 2008

Dr. Ellen Voorhees, from TREC over at NIST sent me this Call for Papers:

Subject: ECIR 2008 Call For Participation

2nd CALL FOR PARTICIPATION

*Full Programme and Accommodation Details Now Online*
** Early Bird Registration Deadline: 29th February 2008 **

30th European Conference on Information Retrieval (ECIR 2008) 30th March – 3rd April University of Glasgow
http://ecir2008.dcs.gla.ac.uk/ 

Organized by: University of Glasgow
In cooperation with: BCS-IRSG, ACM SIGIR

* Conference programme (31st March – 2nd April) The technical program will include 33 full research papers, 19 short papers, and 28 posters. Full programme is available at
http://ecir2008.dcs.gla.ac.uk/programme.html     

Tutorials and workshops (30th March)
http://ecir2008.dcs.gla.ac.uk/tutorials_workshops.html    
ECIR 2008 will host three half-day tutorials and three full-day workshops
Tutorial: Advanced language modeling approaches (case study: expert
search) -  Djoerd Hiemstra, University of Twente, The Netherlands
Tutorial: Search and discovery in user-generate text content – Maarten de Rijke and Wouter Weerkamp, ISLA, University of Amsterdam, The Netherlands
Tutorial: Research and building IR applications using Terrier – Craig Macdonald and Ben He, University of Glasgow, UK
Workshop: Workshop on novel methodologies for evaluation in Information Retrieval – organized by Mark Sanderson, University of Sheffield, UK, Martin Braschler, Zurich University of Applied Sciences, Switzerland, Nicola Ferro, University of Padova, Italy, and Julio Gonzalo, UNED, Spain
Workshop: Efficiency issues in Information Retrieval Workshop – organized by Roi Blanco, Universidade da Coruña, Spain and Fabrizio Silvestri, ISTI CNR, Italy
Workshop: Exploiting semantic annotations in Information Retrieval – Omar Alonso, A9.com, USA and Hugo Zaragoza, Yahoo! Research Barcelona, Spain

*Industry day (3rd April)
http://ecir2008.dcs.gla.ac.uk/industry.html     
A day of presentations and discussion dedicated to the interests and needs of Information Retrieval practitioners.

* Keynote speakers
ECIR 2008 is pleased to announce the following three keynote speakers:
Professor Nicholas J. Belkin Rutgers, University, USA – Some(what) Grand Challenges for Information Retrieval Professor Bettina Berendt, K.U. Leuven, Belgium – You are a document
too: Web mining and IR for next-generation information literacy Dr Amit Singhal, Google Research Fellow, Google, USA – Web Search:
Challenges and Directions

* Paper awards
Best paper award, best student paper award and best poster award, all sponsored by Yahoo! Research

* Social events
A social event every day, including a Banquet reception on 31st March at the Kelvingrove Museum and a civic reception at the City Chambers on 1st April. Further details will be posted on the ECIR 2008 Web site.

* Registration
For registration and other information, please visit the ECIR 2008
home page at http://ecir2008.dcs.gla.ac.uk/registration.html   

* Main sponsors
Google  http://www.google.com     
Microsoft Research  http://research.microsoft.com    
Yahoo! Research  http://research.yahoo.com     
MatrixWare Information Services  http://www.matrixware.com    

We are looking forward to welcoming you in Glasgow,

The ECIR 2008 Chairs
(Ian Ruthven, Vassilis Plachouras, Ryen White and Iadh Ounis)

TREC 2008 Call for Papers

December 19, 2007

Ellen Voorhees, TREC Project Manager and Group Manager over at NIST.gov, sent me the TREC 2008 Call for Papers. To facilitate disemination and promote the event, I’m reproducing the Call below. 

CALL FOR PARTICIPATION

TEXT RETRIEVAL CONFERENCE (TREC)

February 2008 – November 2008

Conducted by:
  National Institute of Standards and Technology (NIST)

With support from:
  Intelligence Advanced Research Projects Activity (IARPA)

The Text Retrieval Conference (TREC) workshop series encourages
research in information retrieval and related applications by
providing a large test collection, uniform scoring procedures,
and a forum for organizations interested in comparing their
results. Now in its seventeenth year, the conference has become
the major experimental effort in the field. Participants in
the previous TREC conferences have examined a wide variety
of retrieval techniques and retrieval environments,
including cross-language retrieval, retrieval of web documents,
multimedia retrieval, and question answering. Details about TREC
can be found at the TREC web site, http://trec.nist.gov .

You are invited to participate in TREC 2008. TREC 2008 will
consist of a set of tasks known as “tracks”. Each track focuses
on a particular subproblem or variant of the retrieval task as
described below. Organizations may choose to participate in any or
all of the tracks. For most tracks, training and test materials are
available from NIST; a few tracks will use special collections that
are available from other organizations for a fee.

Dissemination of TREC work and results other than in the (publicly
available) conference proceedings is welcomed, but the conditions of
participation specifically preclude any advertising claims based
on TREC results. All retrieval results submitted to NIST are
published in the Proceedings and are archived on the TREC web site.
The workshop in November is open only to participating groups that
submit retrieval results for at least one track and to selected
government personnel from sponsoring agencies.

Schedule:
——–

By February 21 — submit application described below
  to NIST. Returning an application will add you to
  the active participants’ mailing list. On Feb 25,
  NIST will announce a new password for the “active
  participants” portion of the TREC web site. Included
  in this portion of the web site is information regarding
  the permission forms needed to obtain the TREC document
  disks.

Beginning March 1 — document disks used in some existing
  TREC collections distributed to participants who have
  returned the required forms. Please note that no disks
  will be shipped before March 1.

mid-July–mid-August — results submission deadline for most tracks
  (Results deadline may need to be even earlier for
  some tracks depending on assessor availability.
  Specific deadlines for each track will be included in
  the track guidelines, which will be finalized in the spring.)

September 9 (estimated) — speaker proposals due at NIST.

September 30 (estimated) — relevance judgments and individual
  evaluation scores due back to participants.

Nov 18-21 — TREC 2008 conference at NIST in Gaithersburg, Md. USA

Task Description:
—————-
Below is a brief summary of the tasks. Complete descriptions of
tasks performed in previous years are included in the Overview
papers in each of the TREC proceedings (in the Publications section
of the web site).

The exact definition of the tasks to be performed in each track for
TREC 2008 is still being formulated. Track discussion takes place
on the track mailing list. To be added to a track mailing list,
follow the instructions for contacting that mailing list as
given below. For questions about the track, send mail to the
track coordinator (or post the question to the track mailing list
once you join).

TREC 2008 will have one new track and four continuing tracks.
The new track is the relevance feedback track, a track
that will systematically explore the factors that affect
relevance feedback behavior. The blog, enterprise, legal, and
million query tracks will continue in TREC 2008, though the
specific tasks in a track may differ from year to year.
(Note that the QA track has been moved to the new Text Analysis
Conference (TAC); the call for participation in TAC
will be sent to the TREC friends mailing list.)

Blog Track — The purpose of the blog track is to explore information
  seeking behavior in the blogosphere.

Track coordinator: Iadh Ounis, ounis@cs.gla.ac.uk
  Mailing list: send a mail message to listproc@nist.gov
  such that the body consists of the line
  subscribe trec-blog

Enterprise Track — The purpose of the enterprise track is
  to study enterprise search: satisfying a user who is
  searching the data of an organization to complete some task.

Track coordinators: Nick Craswell, nickcr@microsoft.com
  Ian Soboroff, ian.soboroff@nist.gov
  Arjen de Vries, arjen@acm.org
  Track web page: http://www.ins.cwi.nl/projects/trec-ent
  Mailing list: send a mail message to listproc@nist.gov
  such that the body consists of the line
  subscribe trec-ent

Legal Track — The goal of the legal track is to develop search technology
  that meets the needs of lawyers to engage in effective discovery
  in digital document collections.

Track coordinators: Jason Baron, jason.baron@nara.gov
  Doug Oard, oard@umd.edu
  Track web page: http://trec-legal.umiacs.umd.edu
  Mailing list: contact oard@umd.edu to be added to the list.

Million Query Track — The goal of the “million query” track is to test
  the hypothesis that a test collection built from very many very
  incompletely judged topics is a better tool than a collection built
  using traditional TREC pooling.

Track web page: http://ciir.cs.umass.edu/research/million
  Mailing list: follow the instructions given on the track web page
  to join the email list ( million@cs.umass.edu )

Relevance Feedback Track — The goal of the relevance feedback track
  is to provide a framework for exploring the effects of different
  factors on the success of relevance feedback.

Track coordinators: Chris Buckley, cabuckley@sabir.com
  Steve Robertson, ser@microsoft.com
  Track web page: http://groups.google.com/group/trec-relfeed
  Mailing list: follow the instructions given on the track web page
  to join the email list

Conference Format
————————-

The conference itself will be used as a forum both for presentation
of results (including failure analyses and system comparisons),
and for more lengthy system presentations describing retrieval
techniques used, experiments run using the data, and other issues
of interest to researchers in information retrieval. As there
is a limited amount of time for these presentations, the TREC
program committee will determine which groups are asked to speak
and which groups will present in a poster session. Groups that
are interested in having a speaking slot during the workshop
should submit a 200-300 word abstract in September describing
the experiments they performed. The program committee will use
these abstracts to select speakers.

As some organizations may not wish to describe their proprietary
algorithms, TREC defines two categories of participation.

Category A: Full participation
  Participants will be expected to present full details of system
  algorithms and various experiments run using the data, either in
  a talk or in a poster session.

Category C: Evaluation only
  Participants in this category will be expected to submit results
  for common scoring and tabulation, and present their results in
  a poster session. They will not be expected to describe their
  systems in minute detail, but will be expected to describe the
  general approach and report on time and effort statistics.

Data
—-
Many of the existing TREC English collections (documents, topics,
and relevance judgments) are available for training purposes and
may also be used in some of the tracks. Parts of the training
collection (Disks 1-3) were assembled from Linguistic Data
Consortium (LDC) text, and a signed User Agreement will be required
from all participants. The documents are an assorted collection
of newspapers, newswires, journals, and technical abstracts.
The LDC has collected a more recent set of newswire material called
the AQUAINT collection (Disks 6&7); this collection will also
be available to TREC participants but is covered by a separate
User Agreement. A third Agreement is needed for the remaining
disks (4-5).

All documents are typical of those seen in a real-world situation
(i.e. there will not be arcane vocabulary, but there may be
missing pieces of text or typographical errors). For most tracks,
the relevance judgments against which each system’s output will be
scored will be made by experienced relevance assessors based on the
output of all TREC participants using a pooled relevance methodology.
See the Overview paper in the TREC-8 proceedings (on the TREC
web site) for a detailed discussion of pooling.

Response format and submission details:
—————————————

Organizations wishing to participate in TREC 2008 should respond
to this call for participation by submitting an application.
IMPORTANT NOTE: Participants in previous TRECs who will participate
in TREC 2008 must submit a new application.

An application consists of the following five parts:
1. Contact information:
  * The full name of the main contact person from your organization.

* The full name of your organization. If you are not
  participating as a member of an organization,
  please specify “self”. If you know there is another group
  from your organization that will also participate in TREC 2008
  (for example, two groups from the same university),
  please give enough qualification in the organization name
  to distinguish the different groups.

* An organization/team name (up to 20 characters) used as a
  unique identifier for your group. You will need to use this
  identifier on correspondence to NIST (when requesting
  data or sending permission forms, for example), so
  remember it. This identifier will also be used to tag your
  runs when you submit them.

* Complete organization physical mail address—sufficient
  such that mail sent to that address will be accepted
  by the post office.

* Fully qualified phone and fax numbers for the main contact person.

* Fully qualified, valid email address for the main contact person.

* Exactly ONE fully qualified, valid email address to use
  for the TREC 2008 participants’ mailing list.
  Because of the overhead involved in maintaining the mailing list
  of active participants, only one email address per participating
  group will be added to the TREC 2008 participants’ list.
  We strongly encourage the use of a local mailing list at
  your institution that distributes TREC mail internally to
  project participants so that all involved see the mail sent
  to this list. TREC is run solely be email, so it is
  important that this address be valid and that mail
  sent to the address is read in a timely fashion.
  You may use the email address of the main contact person
  as the address for the mailing list, but please give
  it twice in the application so we are sure of your intentions.

2. Whether you have participated in TREC before. If so, please
  indicate the years you participated, otherwise indicate that you
  are new to TREC.

3. A one-paragraph description of your retrieval approach.

4. Whether you will participate as a Category A or a
  Category C group.

5. A list of tracks that you are likely to participate in.
  This is non-binding, but is helpful to know for planning.

There is no application form as such; a simple text file consisting
of this information by number is the application.
Please respond using only simple text (i.e., no pdf, word,
rtf, postscript, etc. files). We will not process your application
to participate in TREC 2008 unless it is complete.

All responses should be mailed to Lori Buckland, lori.buckland@nist.gov .
Any questions about conference participation, response format, etc.
should be sent to the general TREC email address, trec@nist.gov  .

AIRWeb 2008 Call for Papers

December 18, 2007

Search Engine Spammers, beware:

Here is the Call for Papers for The Fourth International Workshop on Adversarial Information Retrieval on the Web, to be held in April 22nd, 2008 in Beijing, China:

http://airweb.cse.lehigh.edu:80/2008/cfp.html

As in AIRWeb 2007, there will be a Web Spam Challenge. Let’s call it “Ethical Spamming”, a la “Ethical Hacking”. Indeed, to understand the spammer/hacker mentality you need to either act like one under controlled conditions or be one in a previous life, sort of speak. 

Once again, I’ve been appointed member of the Program Committee. To help promote the event, I’m reproducing below their Call for Papers.

Adversarial Information Retrieval addresses tasks such as gathering, indexing, filtering, retrieving and ranking information from collections wherein a subset has been manipulated maliciously. On the Web, the predominant form of such manipulation is “search engine spamming” or spamdexing, i.e., malicious attempts to influence the outcome of ranking algorithms, aimed at getting an undeserved high ranking for some items in the collection.

We solicit both full and short papers on any aspect of adversarial information retrieval on the Web. Particular areas of interest include, but are not limited to:

Link spam
Content spam
Cloaking
Comment spam
Spam-oriented blogging
Click fraud detection
Reverse engineering of ranking algorithms
Web content filtering
Advertisement blocking
Stealth crawling
Malicious tagging

Proceedings of the workshop will be included in the ACM Digital Library. Full papers are limited to 8 pages; work-in progress will be permitted 4 pages.

Web Spam Challenge
Last year we introduced a novel element at the workshop: a Web Spam Challenge for testing web spam detection systems. We will be holding the Web Spam Challenge again this year, using the WEBSPAM-UK2007 collection for Web Spam Detection which we anticipate being released in early January, 2008.

The collection includes large set of web pages, a web graph, and human-provided labels for a set of hosts. We will also provide a set of features extracted from the contents and links in the collection, which may be used by the participant teams in addition to any automatic technique they choose to use.

We ask that participants of the Web Spam Challenge submit predictions (normal/spam) for all unlabeled hosts in the collection. Predictions will be evaluated and results will be announced at the AIRWeb 2008 workshop.

More information will be posted to http://webspam.lip6.fr/  

Timeline
15 February 2008: E-mail intention to submit a workshop paper (optional, but helpful)
22 February 2008: Deadline for workshop paper submissions
14 March 2008: Notification of acceptance of workshop papers
31 March 2008: Camera-ready copy due
31 March 2008: Challenge submissions due
22 April 2008: Date of workshop

Organizers and Program Committee
Organizers:

Carlos Castillo, Yahoo! Research
Kumar Chellapilla, Microsoft Live Labs
Dennis Fetterly, Microsoft Research

Program Committee:

Einat Amitay, IBM
András Benczúr, Hungarian Academy of Sciences
Paul-Alexandru Chiri, Uni Hannover
James Caverlee, Texas A&M University
Gordon Cormack, University of Waterloo
Nick Craswell, Microsoft Research
Matt Cutts, Google
Brian Davison, Lehigh University
Ludovic Denoyer, University Paris 6
Aaron D’Souza, Google
Edel García, Mi Islita.com
Natalie Glance, Nielsen BuzzMetrics
Antonio Gulli, Ask.com
Zoltán Gyöngyi, Stanford University
Monika Henzinger, Google
Pranam Kolari, Yahoo! Applied Research
Mark Manasse, Microsoft Research
Marc Najork, Microsoft Research
Alexandros Ntoulas, Microsoft Search Labs
Jan Pedersen, Yahoo! Search
Erik Selberg, Amazon
Torsten Suel, Polytechnic University
Mike Thelwall, University of Wolverhampton
Baoning Wu, Snap
Tao Yang, Ask.com
E-mail: airweb2008@cse.lehigh.edu

Call for Papers for ACL:HLT Conference

November 30, 2007

Dr. Ellen Voorhees, over at TREC (NIST.GOV), sent me this Call for Papers:

I would like to call the IR research community’s attention to the “ACL 2008: HLT” conference.  This year ACL is making a concerted effort to attract excellent Information Retrieval research papers.  I am serving as PC co-chair for IR, and am helped by Noriko Kando, David Carmel, and Elizabeth Liddy, who are serving as “area chairs” for IR.  We are putting together a group of reviewers who have solid experience in IR, both to ensure that good IR research is recognized and to prevent poor IR research from slipping through.

Please consider submitting your IR research to ACL:HLT this year.  The ACL connection means that there will be some bias toward papers that touch on language technologies such as NLP, speech recognition, machine translation, discourse, and so on.  However, “general” IR papers are entirely within scope, and areas with IR roots or connections are also encouraged: text mining, filtering, recommendation systems, question answering, classification, clustering, sentiment analysis, etc.

The submission deadline for ACL:HLT is January 10, 2008.  ACL will be
held June 15-20 near Ohio State University in Columbus, Ohio.  (The
discount airline Skybus flies there from a large number of places around the US, and guarantees 10 seats for $10 each on every flight.  Of course, they’re probably already taken, but it’s nice to contemplate.)

If you’re considering SIGIR instead or in addition, its submission
deadline is January 28th [abstract due a week earlier].  SIGIR will be
held July 20-24 in Singapore.  SIGIR’s acceptance rate has been running slightly below 20% lately.

The complete call for papers as well as other useful information is
available at http://acl2008.org.
 

IPAM Workshop: Cyber-enabled Discovery & Information

September 26, 2007

Dr. Mark Green, from IPAM at UCLA, sent me  this Call for Workshop:

Dear colleagues,

I am writing to let you know about a 1-day workshop IPAM is holding on October 29 which the NSF has just asked us to organize as part of a coordinated effort by the NSF Mathematics Institutes.  The workshop is intended to aid those interested in writing proposals to understand the initiative better and to help them in finding collaborators.

The NSF is rolling out a major new initiative in late September on “Cyber-enabled Discovery and Innovation.” This will begin as a $50 million dollar program the first year, and will grow over the next 5 years into a $250 million program.

The goal of this workshop is to inform the scientific community about the CDI program, with the aim of eliciting strong proposals involving mathematical scientists. This workshop will be focused on the “knowledge extraction” aspect of the CDI program. For more information about CDI, see

http://www.nsf.gov/news/news_summ.jsp?cntn_id=108366

We plan on having three types of presentations:

–An information session with Q&A with a representative from NSF

–Several panels where each panelist would present a few slides about what they consider to be the interesting and important questions of long-term significance, followed by a discussion with Q&A. Topics we envision at this point are:

(i) Numerical Methods for Fast Knowledge Extraction

(ii) Nonlinear Methods for Dimensional Reduction

(iii) Knowledge Extraction from Images and Problems of Visualization

(iv) Discrete and Graph-based Techniques for Knowledge Extraction and Analysis of Large Networks

–Selected examples of success stories of applying knowledge extraction techniques from the mathematical sciences to large scale problems

For the program webpage and an online application/registration form, see http://www.ipam.ucla.edu/programs/cdi2007/ .

 

Best regards,

Mark

European Conference on Digital Libraries (ECDL 2007)

September 11, 2007

The Computer and Automation Research Institute of the Hungarian Academy of Sciences (MTA SZTAKI) cordially invites you to participate in the 11th ECDL conference to be held in Budapest, Hungary on 16-21 September 2007.

Aims and Scopes:

This unique and well-established series brings together researchers, developers and practitioners working in various disciplines and related areas of digital libraries all over the world. The conference will consist of a three days technical program, preceded by a tutorial day and followed by workshops. The technical program will include refereed paper presentations, plenary sessions, panels and poster sessions. The tutorials will provide in-depth looks at areas of current interest.

ECDL 2007 will be devoted to discussions about hot issues and applications and will primarily provide a forum to reinforce the collaboration of researchers towards the Ubiquitous Digital Libraries.

Topics include, but not limited to:

Digital curation
Theoretical models for digital information management
Personal and personalized digital libraries
Concepts of Digital Libraries and digital content
Collection building, management and integration
System architectures, integration and interoperability
Information organization, search and usage
Multilingual information Access and Multimedia Information Handling
User interfaces for Digital Libraries
User studies and system evaluation
Digital archiving and preservation: methodological, technical and legal issues
Collaboration in DLs
Digital Library applications in e-science, e-learning, e-government, cultural heritage, etc.

For additional information, visit the conference site

ACM Document Engineering 2007 Conference

August 24, 2007

The DocEng 2007 will be held at the University of Manitoba, Winnipeg, Canada from August 28 – 31, 2007.

The ACM Symposium on Document Engineering is an annual international academic conference devoted to the dissemination of research on models, tools and processes that improve our ability to create, manage and maintain documents.

Documents are one of the centerpieces of globally interconnected systems that store information drawn from many media and deliver that information as required by users. A document may be stored in final presentation form or may be generated on-the-fly, undergoing substantial transformations in the process. Documents may include extensive hyperlinks, thereby permitting virtual documents, and also making available structured collections of information on which to anchor automated reasoning, such as promoted through the Semantic Web. Furthermore, document technologies like XML are having a profound impact on data modeling, in part because of the way these technologies bridge and integrate a variety of paradigms.

The attend next week conference or for additional information visit

http://www.cs.umanitoba.ca/~doceng07/

Reviewing Papers: How-To

August 16, 2007

As reviewer of journal manuscripts and conference papers I normally look to see if the piece before me answers the following questions:

1. WHAT-WHY: What is the scientific problem at hand and why is important?
2. WHO-WHAT-WHY: Who proposed what previous solutions and why are these inadequate or incomplete?
3. WHAT-YOUR-WHY: What is your proposed solution and why is better?
4. HOW-WHAT: How is the solution implemented and what are the benefits or practical applications?
5. PROS-CONS-WHAT: What are the possible pros and cons of your solution and what are the next areas of research?

(more…)

Computer Science & Engineering, Berlin

August 15, 2007

The Fourth International Conference on Computer Science and Engineering (CSE 2007) will be held in Berlin, Germany during August 24-26, 2007.

CSE 2007 aims to bring together researchers, scientists, engineers, and students to exchange and share their experiences, new ideas, and research results about all aspects of Computer Science and Engineering, and discuss the practical challenges encountered and the solutions adopted.

(more…)

Argentine Symposium on Artificial Intelligence

August 13, 2007

ASAI, the Argentine Symposium on Artificial Intelligence, is an annual event intended to be the main forum of the Artificial Intelligence (AI) community in Argentina. The symposium aims at providing a forum for researchers and AI community members to discuss and exchange ideas and experiences on diverse topics of AI. Previous ASAI editions stimulated presentations on both applications of AI and new tools and foundations currently under development.

(more…)

2007 SIGIR – The Conference Papers

August 7, 2007

The 30th Annual International ACM SIGIR Conference was over 10 days ago (23-27 July 2007, Amsterdam). I didn’t have time to list the accepted papers/posters/demos. Here they are.

http://www.sigir2007.org has all the glory. While you are there and if you have an account, check Karen Sparck Jones online video.

(more…)

TREC 2008 Call for Papers

August 3, 2007

Dr. Ellen Voorhees, over at TREC, a NIST.gov dependency, sent me the TREC 2008 Call for Papers.

If you are a colleague feel free to submit or forward to your college faculty –across depts/disciplines.

(more…)