Tags

, ,

At the time of writing, curated collections of topics and authors can be easily done by mining Google Scholar results with Minerazzi. This can be illustrated with the following examples.

Mining Topics Example

1. Search [ pagerank ] with the Information Retrieval Collection  miner at http://www.minerazzi.com/irc

2. Locate the result whose URL is http://scholar.google.com.pr/scholar?q=pagerank and click the Search Inside tool icon located below said result.

3. Note from step 2 output that for Google Scholar some of the links discovered by the tool are about co-authors discovered by Google. Locate a co-author and click the Search Inside tool icon, this time located the right of said result.

4. You will be presented with a new list of results each with the Search Inside icon. Some of these include co-authors.

By recursively using Search Inside you can build a curated collection on the pagerank topic or a curated collection of co-authors, without having to resubmit the query.

This approach assumes that the initial Google Scholar URL to be mined is already in the IRC microindex. For other queries, you need to query Google Scholar and submit for indexing in IRC the search results URL. Once indexed, it can be mined as described above.

However, if a user discovers a Google Scholar URL when using the Search Inside tool on a previous result, said URL can be recrawled and mined as described above, so it no need to be in the IRC microindex at all.

In general, any URL searchable with Search Inside can be mined, unless the tool hits a dead end (no links accessible or to follow).

Advertisements