It is surprising how even serious information retrieval researchers and journals quote papers that were never written!
This is the thesis of David Dubin’s 2004 great article
The Most Influential Paper Gerard Salton Never Wrote
Dubin wrote:
“In giving credit to Salton for the vector model, a number of authors cite an overview paper titled “A Vector Space Model for Information Retrieval,” which some show as published in the JASIS in 1975 and others as published in the Communications of the Association for Computing Machinery (CACM) in 1975. In fact, no such article was ever published, and citations to it usually represent a confusion of two 1975 articles (Salton, Wong, & Yang, 1975; Salton, Yang, & Yu, 1975), neither of which were overviews of the VSM as it is generally understood (see section 5 below). Some of Salton’s own colleagues have been guilty of this mistake: both Cardie et al. and Singhal cite the CACM version, for example (Singhal, 2001; Cardie, Ng, Pierce, & Buckley, 2000). The paper is even cited in a few of the very last articles on which Salton is listed as a coauthor (Singhal, Salton, Mitra, & Buckley, 1996; Singhal & Salton, 1995). These papers were published close to or shortly after the time of his death, and so the errors cannot be blamed on Salton (remembered by his colleagues as a very careful and meticulous writer).”
Somehow far too many IRs misquote Salton’s 1975 paper titled “A vector space model for automatic indexing“. This causes digital libraries to create a spurious record attached to many cross-referenced articles.
I searched Google for “a vector space model for information retrieval” + salton and indeed there are many reputed publications and researchers citing a paper that was never published! What a shame.
That says a lot about researchers, editors, and reviewers that were lazy enough to never bother about the accuracy of the references.
Thanks for your kind words about my article; I’m glad you found it interesting. But you know, it really wasn’t my intention to wag my finger at my colleagues’ sloppy citation practices. I think references to Salton’s phantom paper point to a basic theoretical problem in what the VSM actually models, and how we understand what IR models do for us.
It’s the difference between saying something like:
A) I wrote some software, and here’s a geometric model to help you understand exactly what it does.
as opposed to…
B) I want to advance a substantive claim about the nature of documents and queries. My model makes some simplifying assumptions, but its purpose is to represent something real about the world in which my software functions — something that’s independent of my system design decisions.
I see a basic confusion between approaches A and B that shows up surprisingly often in information science research.
Dave Dubin
Hi, Dave:
Thank you for stopping by and commenting. I understand what you’re saying.
Regardless, sloppy referencing/reviewing of papers is what it is –no more, no less.
Between A and B above, I prefer A. I never like long sentences, particularly with three “my”s.
Pingback: Citing papers that you’ve never read — or that were never written « IREvalEtAl
Hi, William:
Thank you for stopping by.
You blogged an interesting discussion on the topic. Your points raised are quite valid.
Students beware: if the raw data is flawed, same goes for the analysis. I believe this might go for citation analysis from wrong citations or any analysis for that matter.
Pingback: Kincaid’s ARI: The Original Paper « IR Thoughts