The Most Influential Paper that Gerard Salton Never Wrote

It is surprising how even serious information retrieval researchers and journals quote papers that were never written!

This is the thesis of David Dubin’s 2004 great article
The Most Influential Paper Gerard Salton Never Wrote

Dubin wrote:

“In giving credit to Salton for the vector model, a number of authors cite an overview paper titled “A Vector Space Model for Information Retrieval,” which some show as published in the JASIS in 1975 and others as published in the Communications of the Association for Computing Machinery (CACM) in 1975. In fact, no such article was ever published, and citations to it usually represent a confusion of two 1975 articles (Salton, Wong, & Yang, 1975; Salton, Yang, & Yu, 1975), neither of which were overviews of the VSM as it is generally understood (see section 5 below). Some of Salton’s own colleagues have been guilty of this mistake: both Cardie et al. and Singhal cite the CACM version, for example (Singhal, 2001; Cardie, Ng, Pierce, & Buckley, 2000). The paper is even cited in a few of the very last articles on which Salton is listed as a coauthor (Singhal, Salton, Mitra, & Buckley, 1996; Singhal & Salton, 1995). These papers were published close to or shortly after the time of his death, and so the errors cannot be blamed on Salton (remembered by his colleagues as a very careful and meticulous writer).”

Somehow far too many IRs misquote Salton’s 1975 paper titled “A vector space model for automatic indexing“. This causes digital libraries to create a spurious record attached to many cross-referenced articles.

I searched Google for “a vector space model for information retrieval” + salton and indeed there are many reputed publications and researchers citing a paper that was never published! What a shame.

That says a lot about researchers, editors, and reviewers that were lazy enough to never bother about the accuracy of the references.


  1. Thanks for your kind words about my article; I’m glad you found it interesting. But you know, it really wasn’t my intention to wag my finger at my colleagues’ sloppy citation practices. I think references to Salton’s phantom paper point to a basic theoretical problem in what the VSM actually models, and how we understand what IR models do for us.

    It’s the difference between saying something like:

    A) I wrote some software, and here’s a geometric model to help you understand exactly what it does.

    as opposed to…

    B) I want to advance a substantive claim about the nature of documents and queries. My model makes some simplifying assumptions, but its purpose is to represent something real about the world in which my software functions — something that’s independent of my system design decisions.

    I see a basic confusion between approaches A and B that shows up surprisingly often in information science research.

    Dave Dubin

  2. Hi, Dave:

    Thank you for stopping by and commenting. I understand what you’re saying.

    Regardless, sloppy referencing/reviewing of papers is what it is –no more, no less.

    Between A and B above, I prefer A. I never like long sentences, particularly with three “my”s.

  3. Hi, William:

    Thank you for stopping by.

    You blogged an interesting discussion on the topic. Your points raised are quite valid.

    Students beware: if the raw data is flawed, same goes for the analysis. I believe this might go for citation analysis from wrong citations or any analysis for that matter.

    1. Wrong!

      That’s precisely the article misquoted/wrongly referenced in the IR literature.

      The issue here is the TITLE of the article. No article with the title “A vector space model for information retrieval” was ever published by Salton.

      Please read this post again.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s