• About IR Thoughts

IR Thoughts

~ Thoughts on Information Retrieval, Data Mining, and Search Engines

IR Thoughts

Monthly Archives: February 2012

Which separators to use with title tags?

24 Friday Feb 2012

Posted by egarcia in Marketing Research, SEO Myths

≈ 5 Comments

More and more SEOs are using separators like pipes, dashes, commas, etc when writing title tags. (Exclude underscores from the list, which are used as concatenators.)

Which of these separators perform better?

From time to time some SEO “experts” and their cheerleaders promote the idea that a particular separator performs better over others. See the following link:

http://searchenginewatch.com/article/2154469/How-to-Write-Title-Tags-For-Search-Engine-Optimization

And that is despite the fact that Google’s Matt Cutts has mentioned in two different videos that there is no real significative advantage of using one over the other. These videos are available at the following links:

http://www.youtube.com/watch?v=T2_7PTio3Qc

http://youtu.be/jHSqLYUPq8w

My take?

I would like to see an analytical study supporting the facts. I believe this is reasonable to ask. Don’t you think so?

A Study of Puerto Rico Newspaper Home Pages

15 Wednesday Feb 2012

Posted by egarcia in Data Mining, Marketing Research

≈ Leave a Comment

A study of Puerto Rico newspaper home pages is now available in ppsx format at Minerazzi (http://www.minerazzi.com/demos/newpapers-report.ppsx).

Hey, SEOs: On Information Gain, Keyword Wallop, and Relevance

13 Monday Feb 2012

Posted by egarcia in Human-Computer Interaction, Machine Learning, Marketing Research

≈ Leave a Comment

Which words pack more wallop, are more emphatic, are more beefy or juicy? Whatever you want to call it, if you are an SEO or copywriter, you probably know what I mean.

Well, the answer to such a question depends on what you are trying to accomplish.
According to the family of BM25 algorithms,

http://irthoughts.wordpress.com/2011/08/04/bm25-and-bm25f-implications-to-seo-and-web-design/

a term has more information gain during its first occurrences, especially if these occur earlier in a document. This pressumes some kind of relationship between information gain and the position and distribution of words in a document.

Journalists and editors understand the concept. That’s why they like to answer the who, what, when, why, and how early in a copy, although not necessarily in that order.

And that’s why you see so many press release titles written in a ‘who-what’ way!

That strategy might work with search engines, but if you want to emphasize more specific keywords in a natural way you probably need a different keyword positioning strategy, at least if you write in English.

Says who? William Strunk, Jr. in his book The Elements of Style.
Says who? Joe Carrillo and Strunk, and quote:

http://josecarilloforum.com/forum/index.php?topic=496.0;prev_next=next

In his original 1918 edition of The Elements of Style (that was long before E. B. White came up with a chapter on style that made him a co-author of the book), William Strunk, Jr. came up with this perplexing prescription in his discussion of the principles of exposition:

“The proper place for the word, or group of words, which the writer desires to make most prominent is usually the end of the sentence…The word or group of words entitled to this position of prominence is usually the logical predicate, that is, the new element in the sentence…”

Strunk gave the following example to illustrate his point:

The modifying phrase at the tail-end of the sentence: “This steel is principally used for making razors, because of its hardness.”

The logical predicate at the tail-end of the sentence: “Because of its hardness, this steel is principally used in making razors.”

And here is the eye-opening point:

For his final words on the subject, however, Strunk made the following provocative—and as I already said, perplexing—prescription:

“The principle that the proper place for what is to be made most prominent is the end applies equally to the words of a sentence, to the sentences of a paragraph, and to the paragraphs of a composition.”

Carrillo’s essay is an excellent one. He later wrote a follow up post and quote:
http://josecarilloforum.com/forum/index.php?topic=627.0

In spoken English, we can emphasize the ideas we want to emphasize by giving them a stronger stress, leveling off our voice when enunciating minor or neutral ones, and downplaying the points that simply don’t support our contention. In writing, however, the process is rarely that simple. We can achieve emphasis only with our choice of words and how we array them into word clusters, into clauses and phrases, and ultimately into sentences and paragraphs. Mechanical devices exist that help, of course, like underlining, boldface type, italics, headlines and subheadlines, and—in today’s savvy word-processing routines—even colors, clip-arts, and emoticons. But as the aspiring writer soon discovers, much of the emphasis we seek has to be built into the very contours of the individual words as they unfold on the page.

There are three basic word-positioning principles we must know for maximum emphasis in writing English sentences: first, the initial and terminal positions of sentences are by nature more emphatic than their middles; second, when we construct a complex sentence, the main clause gets more emphasis than subordinate clauses; and third, when everything is written and done, the last words of the sentence are normally the most emphatic of all. These are structurally inherent in the English language itself, as we will see more clearly when we study them in closer detail.

Carrillo then mentions three important concepts:
1. The initial and terminal positions of sentences are prime.
2. The main clause gets more emphasis than subordinate clauses.
3. The last words of the sentence are normally the most emphatic.

The take away

Clearly, all this shows that although interrelated, information gain, keyword wallop, and relevancy are not the same thing. Relevancy is more along the lines of “aboutness”, “eliteness”, and few other semantic concepts.

The problem is that there is a relevance perception divide between machines and end-users: topic that we have discussed. See this link:

http://irthoughts.wordpress.com/2007/06/01/sneak-preview-of-ir-watch-2007-6-issue/

Still thinking in the keyword density/spamming crap?

Social Media and Puerto Rico Local Brands

05 Sunday Feb 2012

Posted by egarcia in Marketing Research

≈ Leave a Comment

Arteaga & Arteaga, affiliated to MSLGroup invited Stephen Marino, one of MSL directors, to present “Social and Digital Impact on Reputation and Growth”, an interesting topic.

According to an article written by Alana Alvarez Valle from El Vocero, one of the largest newspapers in Puerto Rico,

A pesar de que en la actualidad múltiples empresas han adoptado el uso de las redes sociales para producir beneficios, aún existe resistencia a ceder un poco del poder ante sus consumidores.

Stephen Marino, director de medios sociales y digitales de MSL Group, explicó que el concepto de ‘co-creación’ es fundamental para la evolución.

“Es permitir que tu marca sea tomada por el público, para que te ayuden a redefinir algunas cosas que tal vez no te habías percatado. Las corporaciones sienten que pierden control. Sin embargo, lo que ganan es mucho más, porque ganan lealtad, una base de fanáticos que se sienten parte de la marca. No es que las compañías van a cambiar todo sólo porque yo lo digo, pero el hecho de que están escuchando, el hecho de que se que me están prestando atención, hace la diferencia”, señaló a EL VOCERO.

Marino ofreció su conferencia ‘Social and Digital Impact on Reputation and Growth’ ante un variado grupo de Directores, Gerentes de mercadeo, Gerentes de marca y comunicadores durante una actividad de Voice Public Relations, la división de relaciones públicas de la agencia de publicidad Arteaga & Arteaga y afiliada local de MSL Group.

El experto indicó que las pequeñas y medianas empresas son las que más se benefician del uso correcto de las redes sociales, porque buscan crecer y pueden cultivar más negocios. Además porque están más receptivos a las sugerencias y a actuar con rapidez.

Durante su presentación, mostró que durante 2010 se gastaron aproximadamente $8.5 mil millones en mercadeo y publicidad digital. Sin embargo, las empresas están invirtiendo su dinero en solidificar su marca (81 por ciento), en vez de identificar nuevos clientes y oportunidades (32 por ciento), mejorar productos y servicios (29 por ciento) o en la creación de nuevos productos y oportunidades (29 por ciento).

I think local brands and companies from Puerto Rico are “hungry” for this type of information.

February 2012
M T W T F S S
« Jan   Mar »
 12345
6789101112
13141516171819
20212223242526
272829  

Favorite Sites

  • Mi Islita

Pages

  • About IR Thoughts

Categories

  • AIRWeb Course
  • Conferences
  • Data Mining
  • Dynamics
  • Fractal Geometry
  • Graduate Courses
  • Hacking
  • Homeland Security
  • Human-Computer Interaction
  • Image Compression
  • Internet Engineering
  • IR Quizzes
  • IR Tools
  • IR Tutorials
  • Latent Semantic Indexing
  • Legacy Posts
  • Machine Learning
  • Marketing Research
  • Miscellaneous
  • News
  • Newsletters
  • Programming
  • Quack Science
  • Queries
  • Scripts
  • Search Engines Architecture Course
  • SEO Myths
  • Software
  • Spam
  • Statistics and Mathematics
  • Theses
  • Vector Space Models
  • Web Mining Course

Recent Posts

  • “Powered by” in Spanish
  • Some nice features added to the Image Crawler
  • The Images Crawler
  • A nice service for my locals
  • An update to the Web Crawler
  • New similarity measures
  • The Web Crawler is Back!
  • Tracking Users: An Email Crawler on Steroids
  • The Email Crawler: A Tool for Gathering Emails
  • The Binary Distance Calculator – a tool for comparing binary sets
  • Fractalettes: A Fractal Design Strategy to Color Mining and Learning through Discovery
  • AZZOO and WAZZOO: New Similarity Measures for the 21st Century
  • The Binary Similarity Calculator
  • From Harlem Shake to Link Shake: The Qualified Links Shake
  • Web Vulnerabilities and Search Engines

Archives

  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007

AIRWeb Course Conferences Data Mining Fractal Geometry Graduate Courses Hacking Homeland Security Human-Computer Interaction Internet Engineering IR Quizzes IR Tools IR Tutorials Latent Semantic Indexing Legacy Posts Machine Learning Marketing Research Miscellaneous Newsletters Programming Quack Science Queries Scripts Search Engines Architecture Course SEO Myths Software Spam Statistics and Mathematics Theses Vector Space Models Web Mining Course

Blog at WordPress.com. Theme: Chateau by Ignacio Ricci.