Five years ago, in the article As we may search – Comparison of major features of the Web of Science, Scopus, and Google Scholar citation-based and citation-enhanced databases Peter Jacso from the Department of Information and Computer Science, University of Hawaii, compared several indexes, particularly Google Scholar (G-S). The article found G-S to have too many flaws. He concluded:
“Unfortunately, G-S gives a bad name to autonomous citation indexing. It shows lack of competence, and understanding of basic issues of citation indexing. G-S fails even in implementing the most basic Boolean OR operation correctly. Riding on the waves of the regular Google software which is great for processing the unstructured heap of billions of Web pages, G-S cannot handle even the meticulously tagged, metadata-enriched few million journal articles graciously offered to it by many publishers for free.”
In my opinion, no much has changed since then. That’s why I use G-S as my last option. I prefer public scholar search resources to get at no cost articles not found in G-S or that otherwise I would need to pay from journal indexes. Some of these free resources are designed to get a pdf version of the manuscripts sought. More than one university and private company prefer to have its own open source index available to anyone, but blocking Googlebot.
CiteSeer and DBLP provide quite a few sources for most CS related publications. They are better than Google Scholar for most searches – no doubt about it.
Though not directly related to this topic, I feel like telling people about how I feel about ACM’s policy. I’m just disappointed ACM papers aren’t freely available. The subscription model doesn’t hurt the researcher community. It hurts young under-grads with no money (I was one a few years ago) who hit the pay-wall when searching for publications.
Hi, Neel:
Thank you for stopping by.
I agree. I’ve used these with great success. There are some dozen of scholarly resources better than G-S, to mention a few:
Arxiv, CiteseerX (Citeseer upgrade), PDFGeni, Connotea, Pubget, INFOMINE, Highwire, DOAJ, DLIB, Scirus, Muse, CiteUlike, etc. I have more to sweet colleagues by expertise fields. Thanks God that the Deep Web is free by Nature.
I would respectfully disagree. It is a great resource for a researcher. Especially for an amateur who does not have exhaustive journal subscriptions.
It also has a great feature: it can find all the articles that cite a given one. It is simply invaluable for those who want to keep track of modern trends. Other scientific collections do support such a search, but it is useless without a comprehensive index such as Google Scholar. Last, but not least, it is a reliable thing unlike, e.g., CiteSeer.
I must admit that Google Scholar changed my life.
Thank you for stopping by.
You are welcome to disagree.
As a solution for an amateur as you said, I guess it is okay.
Good for you that G-S has changed your life.
G-S index comprehensiveness has been disputed within librarian circles and after all it is in the eyes (and fingers) of the searcher behind a query box.
When it comes to comparing scholarly engines with G-S, more than one academic or librarian will disagree with you to no end, but will see G-S as life changing experience for the worse.
Here is a revisit Jacso did two years ago on G-S (http://www.jacso.info/PDFs/jacso-GS-revisited-OIR-2008-32-1.pdf ). While some improvements made, he found more lethal cons with query modes and illiteracy problems.
And in this other artifcle (http://www.jacso.info/PDFs/jacso-comparison-analysis-of-citedness.pdf), Dr Jacso compares Web of Science with G-S and concludes
“WoS identifies far more citing sources per source documents than GS. WoS sources are limited to serial publications, but most of them are among the most prestigious journals in their respective disciplines. WoS makes available the list of journals (20) processed for its citation indexes (19). ISI’s competence and experience in the theory and practice of citation indexing is apparent from the test.
The main virtue of GS is that currently it is free for anyone. It is certainly an asset for those who cannot afford the professional multidisciplinary citation-enhanced databases or who need only a few good scholarly articles on a subject. GS is limited to the sources made available for its crawlers by publishers, and to open access sources with widely differing qualities. Thousands of scholarly journals and millions of articles are ignored by GS, or are underreported in terms of citedness. Many of them are top ranked ones in their respective categories such as Vaccine and the Journal of Allergy and Clinical Immunology in this test where many articles from these journals citing papers in APJAI were ignored by GS and thus not counted for the citedness score. For the scholarly users it can be detrimental as many of the most citedarticles are ranked much lower on the result lists of GS than the average or mediocre articles. The poor capabilities of GS to consolidate the matching records inflates both the number of hits and the citedness score. This in turn further distorts the ranking of the results.”
That much for G-S index comprehensiveness.
Here is a list of research articles and G-S pros and cons.
Entry points to many research articles on G-S pros and cons.
http://www.emeraldinsight.com/Insight/ViewContentServlet;jsessionid=14C9CF54F233A6B21CC16EBBC69326F4?Filename=/published/emeraldfulltextarticle/pdf/2640320510_ref.html
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2000776/
http://hlwiki.slais.ubc.ca/index.php/Google_scholar_bibliography
Some research articles by Jacso about G-S:
Jacsó, P., Google Scholar Redux. Gale — Reference Reviews [online]
(June 2005). http://googlescholar2.notlong.com
Jacsó, P., Google Scholar: the Pros and the Cons. Online Information
Review, 2005, 29, 208-214.
http://dx.doi.org/10.1108/14684520510598066
Jacsó, P., Citation Enhanced Indexing/Abstracting Databases. Online
Information Review, 2004, 28, 235 238.
http://dx.doi.org/10.1108/14684520410543689
Jacsó, P., Citedness Scores for Filtering Information and Ranking Search
Results. Online Information Review, 2004, 28, 371-376.
http://dx.doi.org/10.1108/14684520410564307
Jacsó, P., Browsing Indexes of Cited References. Online Information
Review, 2005, 29, 107-112.
http://dx.doi.org/10.1108/14684520510583972
Perhaps, I should make a reservation that it is comprehensive in CS. It indexes ACM libraries (which is I believe 30-50% of all relevant articles), IEEE, and Elsivier journals. GS was created by CS scholars: no wonder that it focuses on CS articles.
Only a few good scholarly articles on a
subject.
I typically read dozens every year (around 50). Maybe, it is a few, indeed
My wife, who is an environmental economist, also reads about the same number. We probably never (or maybe a couple of times) had a case, when we a sought article was not in GC. That is pretty comprehensive to me.
GS is limited to the sources made available for its crawlers by
publishers, and to open access sources with widely differing qualities.
It is true, but in CS (see comment above), it is enough. Of course, many things can be indeed improved and bugfixed.
Hi, Leo:
GS was created by CS scholars: no wonder that it focuses on CS articles.
That can be disputed, but assuming that is the case, it is then a shame that G-S boolean OR mode often returns faulty search counts. It has been in that way before Google built G-S and has been in that way after G-S.
I remember that about five years ago or so I confronted GoogleGuy at SearchEngineWatch Forums (my handler was Orion) on this (on how broken OR searches were in Google) in a series of posts and the issue was not addressed or acknowledged at all. You can do a search on this at SEW forum where I was a moderator. Five years laters I’m still disappointed at their OR answer set. That much for comprehensiveness.
Certainly no IR system is free from flaws, but the above was too obvious to be ignored by so-called CS scholars. So I guess these were sloppy scholars. For instance,
cats OR dogs counts 1,560,000 results
dogs OR cats counts 1,600,000 results
cats counts 2,200,000 results
dogs counts 1,760,000 results
OR searches should count more results, not less.
As Jacso put it (not exactly using the following words):
‘G-S OR makes George Boole turn over in his grave.’
He said:
“If you add (out of curiosity) the letter “a” in an OR relationship, the result set should increase by picking up records for foreign language source documents which use the letter a as the definite article and/or a preposition. In the extreme case, if all anglophone records had the letter “a” as the indefinite article or part of terms such a “blood type A”, “personality A”, “grade A”, the number of hits would not increase.
But in Google Scholar the OR operator decreases the result set to less than 1 per cent of the original set and makes George Boole turn over in his grave. The regular Google search engine does not take part in this nonsense. Some may feel lucky (rather than befuddled) that, although both search terms were purportedly excluded from the search (as the message shows), Google Scholar still could provide with nearly 14 million hits – without using the þ sign or the double quotation mark. Actually, it shows only 1,000 hits at most for any query, so it can claim any number above 1,000 without the burden of proof. If gamblers could bluff in casinos without the burden of showing their cards for the blackjack dealer, there would be many more instant millionaires than at the GooglePlex.”
http://www.jacso.info/PDFs/jacso-GS-revisited-OIR-2008-32-1.pdf
OR searches should count more results, not less.
I can see a faulty assumption here: the Google result count is never exact, it is only an estimate. Because Google uses pruning aggressively.
Talking about the faults: it is always very hard to deliver new functionality and to fix bugs. There are always opinions and entrenched developers, deadlines, and limited budget. In the end, somebody is always unhappy about something.
Almost all search engines do not give you exact counts, but estimates–especially when hitting different intances of the index.
The OR searches I tested back then were from ASK, Yahoo, MSN, and few more others, some using pruning. I also tested experimental ones. Yet, only Google OR mode was the one broken. The others gave what one would expect. So it is hard to attribute this to pruning. Cats pruning dogs or dogs pruning cats? Please. Google OR has always been broken. It cannot be justified.
I agree that it is always very hard to deliver new functionalities and to fix bugs. I don’t now about Google limited budget. Certainly in the end somebody is always unhappy about somthing or happy to look the other way as well.
if you are using google scholar, our paper “on the robustness of google scholar against spam” might interest you. we have researched how difficult it is to manipulate e.g. citation counts on google scholar. in short: it is very easy and users should be aware that google scholar shouldn’t be blindely trusted
http://sciplore.org/blog/2010/06/12/new-paper-on-the-robustness-of-google-scholar-against-spam/
Hi, sciplore:
Thank you for stopping by and sharing your research.
It is a shame that G-S is falling for invisible text in PDFs and inflated citations. These are spam legacy techniques from the ’90s.
PS.
Thank you also for sharing your research on ASEO (Academic Search Engine Optimization).
http://www.sciplore.org/publications/2010-ASEO–preprint.pdf
I have an interface for searching academic databases and found the preprint by hitting the PDFGeni option.
Pingback: Google | Trading Knowledge