A Step Closer to Mind Retrieval


, , , , , , ,

We are getting closer to Mind Retrieval. The implications of being able to mine the brain are obvious for all sciences, in addition to homeland security, law and order, marketing research, etc.

I got last night this news, “Scientists map brain’s ‘thesaurus’ to help decode inner thoughts

Scientists at the University of California, Berkeley, have taken a step in that direction by building a “semantic atlas” that shows in vivid colors and multiple dimensions how the human brain organizes language. The atlas identifies brain areas that respond to words that have similar meanings

Read the news here: http://www.nsf.gov/news/news_summ.jsp?cntn_id=138437&WT.mc_id=USNSF_51&WT.mc_ev=click

Last year I mentioned that we are getting close to Mind Retrieval.


That post was a reminder of a previous 2010 interview by Nuno Valenzuela, a visionary SEM from Spain. Great guy.

I met Nuno back in 2007 when I was invited to present at a Madrid Search Engine Congress (OJOBuscador) on Latent Semantic Indexing (LSI).

See conference legacy links here



Here is a link to Nuno’s interview. You may want to resize browser window:


And some relevant links here:



Unfortunately, OJOBuscador site is now defunct so their links are broken.


Improving Citations with Short Titles


, , , , , ,

We updated our Keywords Spam Detector,


and added new content that editors, writers, SEOs, and others might find useful. I’m reproducing below some of the new material added.

Recommendations for Writing Titles

  • Search Engines
    Search engines might process entire titles, but tend to display in their search results about less than 70 characters. So you may want to limit web page titles to about this mark, like between 60 to 65 characters.
  • Academic Journals
    Some editorial guidelines, like JAMA, limit the length of titles to 150 characters for reports of research and other major articles and 100 characters for Editorials, Viewpoints, Commentaries, and Letters. (JAMA, 2016).
  • Words Usage in Titles
    The average character length of a word in English, Spanish, and similar languages is about six. Thus on average a 60-character title amounts to about 10 words, regardless of if these are unique terms. This is just a reference mark as text estimates can be influenced by other variables. For instance, text averages can be topic-sensitive and influenced by their syntactic structure (Busch-Lauer, 2000).

How short is too short?

  • The length of a title is a relative concept. By current standards, a 60-character title, which amounts to about 10 words, is considered fair enough for search engines, very short for most academic journals, but too long for songs. Indeed. A recent study found that song titles with one or a few words are on the rise and preferred (Kopf, 2016). However, these types of titles are not informative enough for search engines and academic journals.
  • Generally speaking, articles with short titles are more attractive to readers than those with longer titles because the latter are frequently perceived as complex, confusing, or boring. If readers don’t find attractive a title or cannot understand it, there is a little chance that they will read or cite its abstract or the full paper (Deng, 2015; Chawla, 2015).
  • A 2015 study confirmed that academic papers with short titles receive more citations per paper, being more attractive to readers than articles with longer titles (Letchford, A., Moat, H. S., and Preis, T., 2015).
  • A 2012 study found that short-titled articles have higher viewing and citation rates than those with longer titles. Similarly, articles with results-describing titles are cited more often than those with methods-describing titles (Paiva, Nogueira Lima, & Ribeiro Paiva, 2012). The same study found that titles containing a question mark, containing a reference to a specific geographical region, and that used a colon or a hyphen were associated with a lower number of citations.

Visit the Keywords Spam Detector page to learn more about the topic or to follow the referenced studies. It might at least help you to investigate why artists like Rihanna and Justin Bieber prefer one-word song titles.

Keywords Spam Detector Tool


, , , , , , , ,

This is a new tool, available at


Term repetition abuse is considered an adversarial IR practice known as keyword spam. See list of practices we fought at AIRWEB at http://airweb.cse.lehigh.edu/2007/cfp.html

This tool can help you to write better titles, abstracts, descriptions, paragraphs, or full text by allowing you to detect and fix over-repeated terms. The tool uses a proprietary algorithm for detecting frequency-based spam.

Once detected, over-repeated terms can be edited by either reducing their term frequency or diluting the input by adding unique terms not present in the original text.

GFR, Another newspaper company getting into Search Engine Marketing


, , ,

Back in June 27, 2002, about 14 years ago, I presented the seminar “When they search for you, but find your competition” at the local EDP University before a small audience of about 30, mostly marketing firms and academics, where the main topic was Search Marketing and the Semantic Competitor.

I spent the time trying to convince the audience that the future of traditional ad agencies and multimedia companies was search engine marketing (SEM) and social digital platforms. Back then SEO and SEM were unknown three letters. There was no Twitter or Facebook. And very few in PR knew about Google or search engines in general. Many regional and weekly newspapers were not interested in digital marketing, viewing it as a minor competition.

Times have changed with search, social, and everything else under the hood. Now many traditional publisher are moving toward offering SEM, flexing their corporate muscle. Alone came Infopaginas and few others.

The latest one is Grupo Ferre-Rangel, a multimedia company, owner of the largest local newspaper in PR, El Nuevo DĂ­a. Yep they are getting full blast into search engine marketing. Back in the States is the same: Traditional newspapers are getting into SEM.

Move on local little SEM firms. Resistance is futile, especially with the blessing of you know who…


While many are currently fighting the good war in the Social Marketing arena, let be ready for the next waves: Internet of Things Marketing (ITM), in smart houses, buildings, entire cities, the outer space… Planet-to-Planet Internet, anyone?



Very soon.



One of the first papers on LSI


, , , , , ,

Probably one of the first official papers on LSI that is still available online. Save it before no longer is.


Found with the [ lsi ] query through the IRC miner at


The year was 1988. What you were doing back then?


Virus Evolution Citation



Happy to see that The Self-Weighting Model (SWM) paper


was briefly cited in the Virus Evolution journal published by Oxford University Press, in the research paper:

Coevolutionary Analysis Identifies Protein–Protein Interaction Sites between HIV-1 Reverse Transcriptase and Integrase

HTML version: http://ve.oxfordjournals.org/content/2/1/vew002.full).

PDF version: http://ve.oxfordjournals.org/content/vevolu/2/1/vew002.full.pdf

This is a great example of applying data mining techniques to HIV research, a major public health issue according to WHO, UNAIDS, and other world health organizations.

The study agrees with the SWM thesis; i.e., that correlation coefficients are not additive. Glad to see how SWM influenced their data analysis.

More on SWM below:



Time to restore online my old tutorials on the non-additivity of correlation coefficients so the next generations of scientists are not misled (@SEO quacks and @MOZ pseudo-scientists).




The Panama Papers Miner


, , , , , , ,

The Panama Papers is a new miner available at


Find resources and entry points to the Panama Papers, the largest data leak of deception and corruption. Search by name, subject, or country.

A brief illustrated guide to building curated collections with Minerazzi is also available at


An application example to the Panama Papers is provided.

04-15-2016 update: 300+ new additional records just indexed this morning.

Big Data Sources Miner & Search Engine


, , , , , , , , , , ,

Big Data Sources is a new Minerazzi.com miner available at


This is a searchable collection of big data sources from around the World, all now at your fingertips.

Search by company, location, or service.

Additional miners are available at http://www.minerazzi.com