Archive for the ‘Marketing Research’ Category

SEOs and University Education

February 29, 2008

Posters at SEOmoz are debating why the Internet is not taught at schools.

One poster claims: “I think all Universities are quite a ways off from this.” Others simply think this will never happen or that if it does, it will not be worth it.

These opinions are understandable, especially when universities have offered courses with “ecommerce”, “web marketing”, “ebusiness” and similar terms in their course titles when most of these are soooo outdated. Many are limited to explaining what is a cookie, bayesian and game theorems, and few other topics that are not really that useful in the real world.

Here is a first hand story. Back in 2002 I was hired by Graduate School of Business of University of Turabo in PR to teach the graduate course ECommerce Technology. It was the first time the course was offered as a core course for students pursuing a master degree. The problem: the syllabus and textbooks were sooo outdated, with case studies of companies that no longer existed. I was forced to redesign the entire course and material.  

Here is another first hand story. Many students that took data mining at Polytechnic University of Puerto Rico (PUPR) before I was hired by the university were complaining that they did not learn anything useful because lectures emphasized theory and no practice. This is something I tried to avoid when back in October I started to teach the Web Mining graduate course. It is the same approach I use for my other courses.

As for studying the Internet as SEOmoz posters argue, it is not possible to study “The Internet”. When they say “Internet” probably they refer to studying search marketing, SEO, or Web Analytics. It looks like an opportunity for other marketers to make some money out of their peers’s ignorance.

I know there are some seos already trying to squeeze money from their peers by offering college-type courses dictated by “experts”. Don’t be gamed by these folks. Their “colleges” and “institutes” are not certified by any higher education body, like The Middle States Association, or by research funding organizations like NSF. These mostly look like scams and their diplomas are not worth even the ink these hold.

As for the above claim of teaching SEO in colleges, there is a list of traditional schools already teaching updated web marketing, design, usability, and even accessibility courses. In fact, more and more grad schools are developing Web Mining and Web Marketing courses.

At PUPR I’m in the curriculum development arena, developing and teaching the following hands-on courses, all at the graduate level:

Search Engines Architecture (Spring - classes start next week; lectures and lab)
Web Mining (Winter - semester just ended; lectures only) *
Search Engines for Penetration Testing and Intelligence (Fall - next fall; lectures and lab) **

* This was a course on Web Mining, Business Intelligence, and Search Engines. Agenda and syllabus is available online.

**Just asked by the head of EE&EC and CS Dept to teach this one.

These ARE NOT paperless, online courses. The class meets in the computer lab building. We have plenty of computers and software to play with. I offer all lectures using powerpoint and smartboards. We study which Web business models work and which one don’t. We check case studies from the Web. We dissect SEO myths. We teach why and how search engine algorithms and web analytics work, etc, etc.I have grad students conducting projects or theses supported by grants from gov agencies like DoD, etc. Some of these projects interface with SEO, Web Analytics, Business Intelligence, and Homeland Security.

In addition, we have an upcoming conference on these topics (October). I’m also pushing for a 2-year certificate on Web Marketing & Analytics with a local college.

And how about AIRWeb wherein , as scholars and researchers, we dissect and test search engine spam strategies and find new ways to neutralize, minimize, or “kill” these techniques–many promoted by some among you?

SEOS: Definitely, we are not oblivious to Web marketing and your “world”.

The Power of Document Linearization

January 25, 2008

In http://www.miislita.com/fractals/keyword-density-optimization.html  I explained to the SEO community the concept of document linearization as part of document GAP analysis. Marketers learned what IR graduate students already know: that document linearization (i.e., markup removal) is just one component of document indexing.

Keyword distribution, word distances, phrase matching, etc. are obtained from the text stream that results from linearization, not from the apparent position of text that is rendered by a browser and visually inspected by average end users. Document linearization debunks the common SEO Keyword Density Myth. One thing is the apparent distribution of words as perceived when end users visually scan a document and another thing is the actual word distribution as parsed by a search engine. The futility of computing KD values is quite obvious.

Here is a report of another recent SEO that discovered the power of document linearization:

http://seo-gw.blogspot.com/2008/01/fractal-semantics-linearization.html

The testimonial is worth to read.

The post http://irthoughts.wordpress.com/2007/12/20/from-keyword-density-to-william-tuttes-legacy/  is also relevant these days.

Search for posts on keyword density: http://irthoughts.wordpress.com/?s=keyword+density

A PageRank-Rank Correlation?

November 20, 2007

On 11-16, Stephane Labert sent me copy of an article that attempts to correlate Google’s PageRank and the rank of a document in this search engine result pages (SERPs).

In spite of the fact that Labert apparently worked hard on the piece, and besides proper credit given for this, I found the article disappointing on the grounds of the sampling, chosen regression model, and statistical analysis employed.

I suggested Labert few tips and things to look at since my perception was that the article was not ready for prime time. My intentions: To prevent Labert from getting unnecessary “harm”.

I was too late. Apparently, by the time I received it, the piece was already sent to many known SEOs or webmasters. This included some of IRW readers, including expert cloaker Ralph Tegtmeier, aka fantomaster.

On 11-17 Tegtmeier blogged about it. He and other SEOs promptly put into question the article’s statistical analysis. I am not going to go over their reactions since I pretty much agree with their critiques. Besides, the main issues argued by Labert and these SEOs are not knew at all and have been revisited many times. For those interested, reactions to Labert’s article can be read at the following links:

http://fantomaster.com/fantomNews/archives/2007/11/17/pagerank-evolution-and-serp-rankings-analyzed-evaluating-a-statistical-study/

http://sphinn.com/story/14452#wholecomment18087

http://www.timnash.co.uk/11/2007/lies-damn-lies-and-pagerank-statistics/

Rather than echoing their comments I prefer to discuss the experimental of Labert’s article:

Firstly, the sampling:

There is no full disclosure on how the data was collected. To be honest, this goes against the article’s credibility. Which queries were used? How many terms were used per queries: 2, 3, 4…? Which query modes were used: AND (FINDALL), ANY (OR), EXACT, constraining modes…? This is important since many variables, including the query, can influence SERPs. None of this was disclosed in the article.

As mentioned, many variables affect ranking results, and some have interactions. Ignoring these interactions and then isolating one variable and plotting this against an X axis does not provide an accurate picture.

Secondly, the regression model:

Why the data was adjusted to a linear model, when it actually tends to be nonlinear? Why apparent outliers were included in the least square analysis? Which error analysis respect to the slope was used to justify the inclusion/rejection of these apparent outliers? None of this was explained or reported.

Third, variable dependencies:

All graphs show a curve with a very small slope for the adjusted regression straight line. This suggests that changes in the X-axis (Rank) provoke small changes in the Y-axis (PageRank), indicating that variables are almost independent from one another, and that is despite the correlation coefficient value allegedly reported as close to 1.

Indeed, a correlation coefficient close to 1 is not enough. To investigate whether any two variables are dependent of one another or that there is a significant correlation between these we need to do more than just look at a bunch of correlation coefficients. As a matter of fact, an almost flat, orthogonal straight line against a Y-axis actually suggests orthogonality and variable independence.

To assess whether the correlation found is significative one could conduct a two-tail t-test and n – 2 degrees of freedom on the correlation coefficient at a defined confidence level. Once this is done, one would need to make the null hypothesis that there is no correlation between X and Y and compare the experimental t-value versus tabulated values from t-test tables. If t-experimental is greater than t-table the null hypothesis is rejected, that is, we conclude in such a case that a significant correlation does exist. This test was not reported, either.

Labert claims to have conducted a more detailed research to support the aforementioned article claims. I look forward to read that.

ADSAM: Emotional Response Modeling

August 6, 2007

I have the pleasure of learning about Dr. Jon Morris, Professor at the University of Florida and CEO of ADSAM. He specializes in Emotional Response Modeling. His company is at the leading edge of the field and has incredible research articles and studies applied to advertising and marketing. I highly recommend those interested in emotional adverstising to read about his work.

(more…)