• About IR Thoughts

IR Thoughts

~ Thoughts on Information Retrieval, Data Mining, and Search Engines

IR Thoughts

Monthly Archives: February 2011

On Telneting and other nifty protocols

28 Monday Feb 2011

Posted by egarcia in Data Mining, Internet Engineering

≈ Leave a Comment

I’ve installed a new server and few services using Windows Vista. These do not come pre-activated and must be installed. This morning I feel like sharing so this post goes.

 

In Windows Vista, you need to install the Telnet Client:

1. Navigate to Start > Control Panel > Programs > Programs and Features > Turn Windows features on or off.  If you are prompted for an administrator password or confirmation, type the password or provide confirmation.

2. In the Windows Features dialog box, select the Telnet Client check box.

3. Click OK. The installation might take several minutes.

 

Other nifty installs available are

RIP Listener

SNMP

Simple TCPIP services

Telnet Server

TFTP Client

 

 

IRW 2011-4-2: n-Grams and Association Measures

18 Friday Feb 2011

Posted by egarcia in Data Mining, Newsletters, Web Mining Course

≈ Leave a Comment

n-grams-and-association-measures

 

The current issue of IRW should reach subscribers inboxes during the day.

This is Part Two of the series on statistical analysis of n-grams. This is a text mining analysis technique widely used in information retrieval and data mining in general. In this issue we cover the implementation of association measures derived from contingency tables.

The QA section explains how to conduct a Chi Square Test for tables with many items; i.e., beyond the usual 2 x 2 contingency tables.

Enjoy it.

Are we near the end of hardcopy scholarly journals?

14 Monday Feb 2011

Posted by egarcia in Marketing Research, Miscellaneous

≈ 4 Comments

According to Lang (2010), we could ask the question whether hardcopy scholarly journals are near the end.  I know, I know. This is kind of an elephant in the middle of a room. 

Lang raises the question based on the following bullet points:

1. Forty-page Articles Are Dead.
2. Survey Articles Are Dead.
3. Journal Issues Are Dead.
4. Page Numbers Are Dead.
5. Copy Editing Is Dead.
6. Peer Reviewing Might Be Dying Too.
7. The Article as a Unit of Publication Is Dead.

Lang then concludes with a question and call to action.

A New Beginning for Scholarly Publishing?

“So let’s abandon all the 20th-century baggage of traditional journals, and move to a more rational model for scholarly publication, with no copy editors, no reviewers, no redundancy, and no unnecessary delays. A concrete step would be to give each ACL member a DOI for a unipaper, and then ask them to non-redundantly populate this with a sequence, or a tree, of numbered paragraphs that consolidate all their work on a topic. Then, to get things moving, the present journal could insist that some proportion of citations be to paragraphs within these unipapers, with hyperlinks embedded right there in the citations. What are we waiting for?”

Feel free to take issues with any of the above points.

My opinion? Lang has very good arguments. However, …. I would say that due to the changing times -read here smart phones, I-tablets, blogs, social networks, etc- many hardcopy scholarly journals are actually evolving while the weakers or unfit to changes are dying as a natural e-phenomenon observed in online ecosystems. This is not unique of scholarly journals. Actually the same is true for any piece of hardcopy journal, newspaper, magazine, newsletters.

With more retailers giving discounts and even freebies just for showing a tweet about their products or services at their store, who knows what will be the fate of  flyers, coupons, etc.

Publishers that don’t adjust their business models to the changing times are deemed to become the next  LPs, 8-tracks, cassette tapes, etc.   

Lang, N. (2010) Are We Near the End of the Journal. Computational Linguistics Volume 36, Number 4.  Retrieved from
http://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00019
  

Spreading some news

10 Thursday Feb 2011

Posted by egarcia in Newsletters, Vector Space Models

≈ Leave a Comment

Back to blogging. I’ve been very busy putting together a paper on a weighting model and answering feedback received from colleagues on it.

So this might explain why the January IRW newsletter is delayed. It should arrive subscribers inboxes during the day. The February issue will be out in about one week. These are back to back issues on Statistical Analysis of N-Grams.

Part 1:N-Grams & Contingency Tables

Part 2: N-Grams & Association Measures

On other matters, a PhD student published few years ago an excellent application of the Vector Space Model applied to Protein Analysis. You can revisit the post at
http://irthoughts.wordpress.com/2008/11/12/vector-space-model-and-protein-retrieval/
 .

If others have other applications of VSM in other disciplines, let me know. I’m interested in multidisciplinary stuff.

February 2011
M T W T F S S
« Jan   Apr »
 123456
78910111213
14151617181920
21222324252627
28  

Favorite Sites

  • Mi Islita

Pages

  • About IR Thoughts

Categories

  • AIRWeb Course
  • Conferences
  • Data Mining
  • Dynamics
  • Fractal Geometry
  • Graduate Courses
  • Hacking
  • Homeland Security
  • Human-Computer Interaction
  • Image Compression
  • Internet Engineering
  • IR Quizzes
  • IR Tools
  • IR Tutorials
  • Latent Semantic Indexing
  • Legacy Posts
  • Machine Learning
  • Marketing Research
  • Miscellaneous
  • News
  • Newsletters
  • Programming
  • Quack Science
  • Queries
  • Scripts
  • Search Engines Architecture Course
  • SEO Myths
  • Software
  • Spam
  • Statistics and Mathematics
  • Theses
  • Vector Space Models
  • Web Mining Course

Recent Posts

  • “Powered by” in Spanish
  • Some nice features added to the Image Crawler
  • The Images Crawler
  • A nice service for my locals
  • An update to the Web Crawler
  • New similarity measures
  • The Web Crawler is Back!
  • Tracking Users: An Email Crawler on Steroids
  • The Email Crawler: A Tool for Gathering Emails
  • The Binary Distance Calculator – a tool for comparing binary sets
  • Fractalettes: A Fractal Design Strategy to Color Mining and Learning through Discovery
  • AZZOO and WAZZOO: New Similarity Measures for the 21st Century
  • The Binary Similarity Calculator
  • From Harlem Shake to Link Shake: The Qualified Links Shake
  • Web Vulnerabilities and Search Engines

Archives

  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007

AIRWeb Course Conferences Data Mining Fractal Geometry Graduate Courses Hacking Homeland Security Human-Computer Interaction Internet Engineering IR Quizzes IR Tools IR Tutorials Latent Semantic Indexing Legacy Posts Machine Learning Marketing Research Miscellaneous Newsletters Programming Quack Science Queries Scripts Search Engines Architecture Course SEO Myths Software Spam Statistics and Mathematics Theses Vector Space Models Web Mining Course

Blog at WordPress.com. Theme: Chateau by Ignacio Ricci.