If you are an IR researcher looking for some open source software this post is for you.
During the second day of the OJOBuscador Congress 2.0 held in Madrid, Spain (March 8, 9) I attended the IR with Usability track.
The first speaker was Dr. Carlos Castillo from Yahoo! Research Spain. He presented on IR with Adversarial and Web Spam.
Then the next speaker and IR practitioner, Jose Ramon-Perez Aguera, presented on several open source software. I want to share with you a handly list, thanks to Jose’s presentation:
About a year ago I downloaded Terrier, a project from the University of Glasgow. I learned about Terrier through kind email exchanges with Dr. Keith (C.J.) van Rijsbergen. Prof Rijsbergen is the winner of the 2006 Salton’s Award conferred by SIGIR.
Lucene and Nutch are well known. At least in the Caribbean, the others are less known.
I am now playing with two industry-strength workbenches. One is open source: WEKA. The other is not: XLMiner (but trial demos are availabe online).
The great thing about the WEKA (Java-based) machine learning workbench is that can be used for both teaching and serious research.
XLMiner is EXCEL-based and more suitable for graduate courses on business administration, but can be used for both teaching and research.
I am planning in using both for a data mining graduate course I’m putting together for a local university.
Installing and running WEKA is by itself a learning project for dedicated graduate students. The great thing is that once installed, faculty, staff, and university affiliates can use it as a labspace for further research.
Currently there are several textbooks and CS Graduate Schools that use WEKA as the “de facto” companion platform.